TIka has boilerpipe, which is not bad for web pages in general. I have
a port of readability, which is better than boilerpipe for news
articles in particular. It seems to me that I should investigate if
Tika has room for both.

On Thu, Nov 11, 2010 at 4:04 PM, Ted Dunning <[email protected]> wrote:
> I believe that this is included in Tika now (according to Ken Krugler)
>
> On Thu, Nov 11, 2010 at 12:37 PM, Isabel Drost <[email protected]> wrote:
>
>> ...
>>
>> As a side note - a project with similar goals was mentioned on the Lucene
>> mailing lists a while ago: http://code.google.com/p/boilerpipe/
>>
>> Cheers,
>> Isabel
>>
>

Reply via email to