Hi Talat,

On Sat, May 3, 2014 at 4:35 AM, <[email protected]> wrote:

>
> Now used parser plugins nekohtml doesnt parse correctly.


What is wrong with it? Are there any issues in Jira to back this up?


> When I tested
> in huge website site, it leaves html tags.


Pretty vague. Anything else? Any more details? Can this be implemented in
existing parser plugins?


> IMHO our parser is little
> bit old.


Which one? Is it possible to upgrade? I don't know which parser you mean.


> After doing some research, I found Jsoup[1] and Gumbo[2]
> parser.  I did some test on broken websites. I saw gumbo and jsoup
> parsed very similar Google's parser.
>
> So what are the benefits? If we have a clear cut argument then lets go for
it. If not then maybe your time would be better invested elsewhere. It's up
to you I suppose :)

Reply via email to