[ 
https://issues.apache.org/jira/browse/LUCENE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842481#action_12842481
 ] 

Uwe Schindler commented on LUCENE-2302:
---------------------------------------

The name ExtendedTermAttribute is to be discussed :-) Any comments?

> Replacement for TermAttribute+Impl with extended capabilities (byte[] 
> support, CharSequence, Appendable)
> --------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2302
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2302
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: Flex Branch
>            Reporter: Uwe Schindler
>             Fix For: Flex Branch
>
>
> For flexible indexing terms can be simple byte[] arrays, while the current 
> TermAttribute only supports char[]. This is fine for plain text, but e.g 
> NumericTokenStream should directly work on the byte[] array.
> Also TermAttribute lacks of some interfaces that would make it simplier for 
> users to work with them: Appendable and CharSequence
> I propose to create a new interface "ExtendedTermAttribute extends 
> TermAttribute". The corresponding -Impl class is always an implementation 
> that extends ExtendedTermAttribute . So if somebody adds a TermAttribute an 
> AttributeSource he will get an implementation class that can be also used as 
> TermAttribute2. As both attributes create the same impl instance both calls 
> to addAttribute are equal. So a TokenFilter that adds ExtendedTermAttribute 
> to the source will work with the same instance as the Tokenizer that 
> requested the (deprecated) TermAttribute.
> To support both byte[] and char[] the internals will be implemented like 
> Token in 2.9: Support for String and char[]. So the buffers are both 
> available, but you can only use one of them. as soon as you call 
> getByteBuffer(), and the char[] buffer is used, it will be transformed. So 
> the inder will always call getBytes() and get the UTF-8 bytes. 
> NumericTokenStream will modify the byte[] directly and if no filter that uses 
> char[] is plugged on top, the buffer is never transformed.
> This issue will also convert the rest of NRQ to byte[] and deprecate all old 
> methods in NumericUtils. NRQ will directly request ByteRef from splitRange 
> and so on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to