Re: Announcement: Boilerplate removal library

Chris Hostetter Mon, 14 Dec 2009 14:52:26 -0800

: working with such a setup for a long time now). Integrating it into an 
: Analyzer should be fairly simple as Boilerpipe can return a string which 
: in turn can be parsed just any other text.


treating the boilerplate removal library as a black box String->String 
transformation seems fairly trivial and could easily be done by 
java applications prior to constructing an Analyzer (ie: 
String->[boilerblackbox]->String->[Analyzer]->TokenStream)

Where things wold probably get more complicated is trying to maintaing 
term position information from the orriginal source text source text (for 
things like search result highlighting and whatnot) which would probably 
require doing the boilerplate removal via something like the CharFilter 
abstraction (or directly in a tokenizer).

Does the code as currently implemented maintain position 
mapping information?


-Hoss

Re: Announcement: Boilerplate removal library

Reply via email to