I just coded a Java port of the arclabs 'readability' javascript code,
which has a very strong reputation as a device for grabbing the useful
content from newsy web pages.

I could contribute it to Tika, if (a) you wanted it, and (b) there was
some reasonable way to decide or configure which one to use.

Reply via email to