: working with such a setup for a long time now). Integrating it into an : Analyzer should be fairly simple as Boilerpipe can return a string which : in turn can be parsed just any other text.
treating the boilerplate removal library as a black box String->String transformation seems fairly trivial and could easily be done by java applications prior to constructing an Analyzer (ie: String->[boilerblackbox]->String->[Analyzer]->TokenStream) Where things wold probably get more complicated is trying to maintaing term position information from the orriginal source text source text (for things like search result highlighting and whatnot) which would probably require doing the boilerplate removal via something like the CharFilter abstraction (or directly in a tokenizer). Does the code as currently implemented maintain position mapping information? -Hoss
