[ https://issues.apache.org/jira/browse/LUCENE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842481#action_12842481 ]
Uwe Schindler commented on LUCENE-2302: --------------------------------------- The name ExtendedTermAttribute is to be discussed :-) Any comments? > Replacement for TermAttribute+Impl with extended capabilities (byte[] > support, CharSequence, Appendable) > -------------------------------------------------------------------------------------------------------- > > Key: LUCENE-2302 > URL: https://issues.apache.org/jira/browse/LUCENE-2302 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Affects Versions: Flex Branch > Reporter: Uwe Schindler > Fix For: Flex Branch > > > For flexible indexing terms can be simple byte[] arrays, while the current > TermAttribute only supports char[]. This is fine for plain text, but e.g > NumericTokenStream should directly work on the byte[] array. > Also TermAttribute lacks of some interfaces that would make it simplier for > users to work with them: Appendable and CharSequence > I propose to create a new interface "ExtendedTermAttribute extends > TermAttribute". The corresponding -Impl class is always an implementation > that extends ExtendedTermAttribute . So if somebody adds a TermAttribute an > AttributeSource he will get an implementation class that can be also used as > TermAttribute2. As both attributes create the same impl instance both calls > to addAttribute are equal. So a TokenFilter that adds ExtendedTermAttribute > to the source will work with the same instance as the Tokenizer that > requested the (deprecated) TermAttribute. > To support both byte[] and char[] the internals will be implemented like > Token in 2.9: Support for String and char[]. So the buffers are both > available, but you can only use one of them. as soon as you call > getByteBuffer(), and the char[] buffer is used, it will be transformed. So > the inder will always call getBytes() and get the UTF-8 bytes. > NumericTokenStream will modify the byte[] directly and if no filter that uses > char[] is plugged on top, the buffer is never transformed. > This issue will also convert the rest of NRQ to byte[] and deprecate all old > methods in NumericUtils. NRQ will directly request ByteRef from splitRange > and so on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org