[ https://issues.apache.org/jira/browse/LUCENE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853905#action_12853905 ]
Robert Muir commented on LUCENE-2302: ------------------------------------- bq. Token now implemnts CharSequence but violates its contract I don't think this is correct. I don't care at all about Token's .toString method, but i care that the analysis api isn't broken. if we do this, then the analysis API is completely wrong when using a Token Attribute Factory. In my opinion we should do one of the following two things in the backwards compatibility section, but not break the analysis API: # Token and TokenAttributeFactory was completely removed due to its backwards compatibility problems. # Token's toString method was changed to match the CharSequence interface. > Replacement for TermAttribute+Impl with extended capabilities (byte[] > support, CharSequence, Appendable) > -------------------------------------------------------------------------------------------------------- > > Key: LUCENE-2302 > URL: https://issues.apache.org/jira/browse/LUCENE-2302 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Affects Versions: Flex Branch > Reporter: Uwe Schindler > Assignee: Uwe Schindler > Fix For: Flex Branch > > Attachments: LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch, > LUCENE-2302.patch, LUCENE-2302.patch > > > For flexible indexing terms can be simple byte[] arrays, while the current > TermAttribute only supports char[]. This is fine for plain text, but e.g > NumericTokenStream should directly work on the byte[] array. > Also TermAttribute lacks of some interfaces that would make it simplier for > users to work with them: Appendable and CharSequence > I propose to create a new interface "CharTermAttribute" with a clean new API > that concentrates on CharSequence and Appendable. > The implementation class will simply support the old and new interface > working on the same term buffer. DEFAULT_ATTRIBUTE_FACTORY will take care of > this. So if somebody adds a TermAttribute, he will get an implementation > class that can be also used as CharTermAttribute. As both attributes create > the same impl instance both calls to addAttribute are equal. So a TokenFilter > that adds CharTermAttribute to the source will work with the same instance as > the Tokenizer that requested the (deprecated) TermAttribute. > To also support byte[] only terms like Collation or NumericField needs, a > separate getter-only interface will be added, that returns a reusable > BytesRef, e.g. BytesRefGetterAttribute. The default implementation class will > also support this interface. For backwards compatibility with old > self-made-TermAttribute implementations, the indexer will check with > hasAttribute(), if the BytesRef getter interface is there and if not will > wrap a old-style TermAttribute (a deprecated wrapper class will be provided): > new BytesRefGetterAttributeWrapper(TermAttribute), that is used by the > indexer then. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org