+1 from me -- those 3 Tika content handlers should take care of it...

Cheers,
Chris

On Dec 21, 2011, at 6:51 AM, Markus Jelsma wrote:

> Hi,
> 
> For using Boilerpipe we need LinkCH, BoilerpipeCH and TeeCH in Tika. LinkCH 
> returns all URL's with some meta data such as title etc. Fixes for old 
> parsers 
> such as Neko are then obsolete.
> 
> I propose to rely on Tika for all outlinks. Right now this means not all 
> types 
> are returned such as area, form and whatelse. Is this a big problem? Rel is 
> also not returned but i patched Tika to do that so we can still do something 
> with nofollow which is important.
> 
> Thanks
> 
> -- 
> Markus Jelsma - CTO - Openindex


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to