Surprising but it looks to me like a bug in Java's collation rules for en-US. According to http://developer.mimer.com/collations/charts/UCA_latin.htm, \u00D8 (which is Latin Capital Letter O With Stroke) should be before U, implying -1 is the correct result. Java is returning 1 for all strengths of the collator. Maybe there is some other subtlety with this character...
Chuck George Aroush wrote on 12/13/2006 04:20 PM: > Hi folks, > > Over at Lucene.Net, I have run into a NUnit test which is failing with > Lucene.Net (C#) but is passing with Lucene (Java). The two tests that fail > are: TestInternationalMultiSearcherSort and TestInternationalSort > > After several hours of investigation, I narrowed the problem to what I > believe is a difference in the way Java and .NET implement compare. > > The code in question is this method (found in FieldSortedHitQueue.java): > > public final int compare (final ScoreDoc i, final ScoreDoc j) { > return collator.compare (index[i.doc], index[j.doc]); > } > > To demonstrate the compare problem (Java vs. .NET) I crated this simple code > both in Java and C#: > > // Java code: you get back 1 for 'res' > String s1 = "H\u00D8T"; > String s2 = "HUT"; > Collator collator = Collator.getInstance (Locale.US); > int diff = collator.compare(s1, s2); > > // C# code: you get back -1 for 'res' > string s1 = "H\u00D8T"; > string s2 = "HUT"; > System.Globalization.CultureInfo locale = new > System.Globalization.CultureInfo("en-US"); > System.Globalization.CompareInfo collator = locale.CompareInfo; > int res = collator.Compare(s1, s2); > > Java will give me back a 1 while .NET gives me back -1. > > So, what I am trying to figure out is who is doing the right thing? Or am I > missing additional calls before I can compare? > > My goal is to understand why the difference exist and thus based on that > understanding I can judge how serious this issue is and find a fix for it or > just document it as a language difference between Java and .NET. > > Btw, this is based on Lucene 2.0 for both Java and C# Lucene. > > Regards, > > -- George Aroush > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]