[ https://issues.apache.org/jira/browse/LUCENE-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787591#action_12787591 ]
Robert Muir commented on LUCENE-1377: ------------------------------------- Yonik, I suppose what I am suggesting is a way to make this easier. Isn't this one of the things hindering adoption of Lucene 3.x in solr? I think it is silly that lucene has Pattern-based tokenization, but solr has a separate impl which is better. I think it is silly that lucene has synonym support, but solr has a separate impl which is better. I think it is silly that lucene has wordnet support, but the right pieces are not exposed so they can be used in solr (for its better synonym support). I think it is terrible that people post to the lucene user list asking how to tokenize hindi (or other complex scripts), when whitespace + worddelimiter works very well for the time being. I think I could go on and on, but we should remove this duplicated effort and try to keep things simpler. For one, I do not want to break things in solr with a lucene update. this is easier if the analysis components are consolidated. > Add HTMLStripReader and WordDelimiterFilter from SOLR > ----------------------------------------------------- > > Key: LUCENE-1377 > URL: https://issues.apache.org/jira/browse/LUCENE-1377 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Affects Versions: 2.3.2 > Reporter: Jason Rutherglen > Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > SOLR has two classes HTMLStripReader and WordDelimiterFilter which are very > useful for a wide variety of use cases. It would be good to place them into > core Lucene. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org