Hi,

For using Boilerpipe we need LinkCH, BoilerpipeCH and TeeCH in Tika. LinkCH 
returns all URL's with some meta data such as title etc. Fixes for old parsers 
such as Neko are then obsolete.

I propose to rely on Tika for all outlinks. Right now this means not all types 
are returned such as area, form and whatelse. Is this a big problem? Rel is 
also not returned but i patched Tika to do that so we can still do something 
with nofollow which is important.

Thanks

-- 
Markus Jelsma - CTO - Openindex

Reply via email to