Thanks for a quick answer.

> As for NUTCH-490, I haven't taken an in-depth look at it, but I don't
> see the point of it. Why not just use HtmlParseFilters since you have
> access to the DOM object? What advantage do neko filters have? Also,
> having an extension point for a library possibly used by a possibly
> used plugin looks really really wrong from a design point.

In my case I want to achieve two things:
1. Ensure there is always TBODY element.
2. Drop all SELECT elements (I don't want it to be in 

You are right, I could manipulate DOM for this. But filters seems to be less 
costly operation, this is why I took this approach first. Though I haven't done 
any tests - maybe I'm too concerned and it doesn't matter that much.

Or possibly my suggestion to make extension point for parser is the best one? 
Then if you want to modify parsing itself, you can do whatever you want. Then 
you also wouldn't need any switch for parsing implementation as it is now. 
Simply modify plugin inclusion. I could do it, if you find it a good idea.

Anyway, as I said, even if you find it all as not having much sense, just close 
the issue with this comment. I really prefer it over hanging request, because I 
know I should rather think of different solution for my case.

Thanks,
Marcin Okraszewski

Reply via email to