[ https://issues.apache.org/jira/browse/LUCENE-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882315#action_12882315 ]
Uwe Schindler commented on LUCENE-2514: --------------------------------------- Robert: I can take MTQ tomorrow. I think we can remove the whole backwards stuff from MTQ and change completely to BytesRef (internally). This makes the steps TermsEnum (bytes) -> TermCollector -> TermQuery which converts all the time simplier. The collector abstract class in the MTQ rewrites will be much nicer. I can also remove the rest of pre-BoostAttribute stuff from TopTermsRewrite. I will go to sleep now, tomorrow more... > Change Term to use bytes > ------------------------ > > Key: LUCENE-2514 > URL: https://issues.apache.org/jira/browse/LUCENE-2514 > Project: Lucene - Java > Issue Type: Task > Components: Search > Affects Versions: 4.0 > Reporter: Robert Muir > Attachments: LUCENE-2514-surrogates-dance.patch, LUCENE-2514.patch, > LUCENE-2514.patch, LUCENE-2514.patch, LUCENE-2514.patch > > > in LUCENE-2426, the sort order was changed to codepoint order. > unfortunately, Term is still using string internally, and more importantly > its compareTo() uses the wrong order [utf-16]. > So MultiTermQuery, etc (especially its priority queues) are currently wrong. > By changing Term to use bytes, we can also support terms encoded as bytes > such as numerics, instead of using > strange string encodings. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org