[ 
https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053319#comment-13053319
 ] 

Robert Muir commented on LUCENE-3080:
-------------------------------------

Well, personally i am hesitant to introduce any encodings or bytes into our 
current analysis chain, because its unnecessary complexity that will introduce 
bugs (at the moment, its the users responsibility to create the appropriate 
Reader etc).

Furthermore, not all character sets can be 'corrected' with a linear conversion 
like this: for example some actually order the text in a different direction, 
and things like that... there are many quirks to non-unicode character sets.

Maybe as a start, it would be useful to prototype some simple experiments with 
a "binary analysis chain" and hackup a highlighter to work with them? This way 
we would have an understanding of what the potential performance gain is.

Here's some example code for a dead simple binary analysis chain that only uses 
bytes the whole way through, you could take these ideas and prototype one with 
just all ascii-terms and split on the space byte and such:
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestBinaryTerms.java
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/BinaryTokenStream.java
 


> cutover highlighter to BytesRef
> -------------------------------
>
>                 Key: LUCENE-3080
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3080
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: Michael McCandless
>
> Highlighter still uses char[] terms (consumes tokens from the analyzer as 
> char[] not as BytesRef), which is causing problems for merging SOLR-2497 to 
> trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to