: SOLR has two classes HTMLStripReader and WordDelimiterFilter which are : very useful for a wide variety of use cases. It would be good to place : them into core Lucene.
FWIW: Just about every concrete TokenFilter and Tokeinzer in Solr's code base could and probably should be promoted up into Lucene-Java -- at the very least into a contrib if not into the "core" A big reason why there hasn't been any movement to do this in many cases is refactoring the testcases -- most Solr tests use the Solr TestHarness to test things at a very high level black box style. essentially all new test cases would be needed. (in other cases there are no test cases, but they were committed to SOlr anyway to scratch an itch) the best appraoch for dealing with things like this is probably to track each individual piece that people want to promote in seperate Jira issues with seperate patches ... that way if someone does right good generalized unit tests for WordDelimiterFilter but not HTMLStripReader (for example) the issues remain detangled and one can be commited before the other. (smaller more self contained patches are a lot easier to review and commit) -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
