[ https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053319#comment-13053319 ]
Robert Muir commented on LUCENE-3080: ------------------------------------- Well, personally i am hesitant to introduce any encodings or bytes into our current analysis chain, because its unnecessary complexity that will introduce bugs (at the moment, its the users responsibility to create the appropriate Reader etc). Furthermore, not all character sets can be 'corrected' with a linear conversion like this: for example some actually order the text in a different direction, and things like that... there are many quirks to non-unicode character sets. Maybe as a start, it would be useful to prototype some simple experiments with a "binary analysis chain" and hackup a highlighter to work with them? This way we would have an understanding of what the potential performance gain is. Here's some example code for a dead simple binary analysis chain that only uses bytes the whole way through, you could take these ideas and prototype one with just all ascii-terms and split on the space byte and such: http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestBinaryTerms.java http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/BinaryTokenStream.java > cutover highlighter to BytesRef > ------------------------------- > > Key: LUCENE-3080 > URL: https://issues.apache.org/jira/browse/LUCENE-3080 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/highlighter > Reporter: Michael McCandless > > Highlighter still uses char[] terms (consumes tokens from the analyzer as > char[] not as BytesRef), which is causing problems for merging SOLR-2497 to > trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org