[ https://issues.apache.org/jira/browse/LUCENE-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uwe Schindler updated LUCENE-3508: ---------------------------------- Attachment: LUCENE-3508.patch Attached you will find a new patch for trunk. I made some improvements to the copy operations and CompoundTokenClass: - copy operations no longer create useless String objects or clones of String's internal char[] (this slows down indexing a lot) - the algorithmic hyphenator uses CTA's char[] directly as it did for Token before (see above) and uses optimized append() - the broken non-unicode-conform lowercasing was removed, instead, the CharArraySet is created case insensitive. If you pass in an own CharArraySet, it has to be case insensitive, if not, filter will fail (what to do? Robert, how do we handle that otherwise?) - As all tokens are again CTAs, the CAS lookup is fast again. - Some whitespace cleanup in the test and relicts in base source file (Lucene requires 2 spaces, no tabs) Robert, if you could look into it, it would be great. I did not test it with Solr, but for me it looks correct. Uwe > Decompounders based on CompoundWordTokenFilterBase cannot be used with custom > attributes > ---------------------------------------------------------------------------------------- > > Key: LUCENE-3508 > URL: https://issues.apache.org/jira/browse/LUCENE-3508 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis > Affects Versions: 3.4, 4.0 > Reporter: Spyros Kapnissis > Assignee: Uwe Schindler > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3508.patch, LUCENE-3508.patch > > > The CompoundWordTokenFilterBase.setToken method will call clearAttributes() > and then will reset only the default Token attributes (term, position, flags, > etc) resulting in any custom attributes losing their value. Commenting out > clearAttributes() seems to do the trick, but will fail the > TestCompoundWordTokenFilter tests.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org