Hi all,

Now used parser plugins nekohtml doesnt parse correctly. When I tested
in huge website site, it leaves html tags. IMHO our parser is little
bit old. After doing some research, I found Jsoup[1] and Gumbo[2]
parser.  I did some test on broken websites. I saw gumbo and jsoup
parsed very similar Google's parser.

Wdyt ?

[1] http://jsoup.org/
[2] https://github.com/google/gumbo-parser

-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

Reply via email to