RE: Sort differences between .NET and Java in Lucene.Net 2.0

George Aroush Wed, 13 Dec 2006 08:36:05 -0800

Hi Torsten,

Thanks for the explanation and the sample program.  However, if you change
your code so that "HUT" is used instead of "HOT" (per my original email),
the value returned will now be 1, instead of -1.  In Java, I get -1 which is
what I believe the right answer is.


This is why those two tests are failing and I wander if this is a defect in
.NET or in the way the culture info is used in those two languages or if
there is more culture setting I have to do in .NET.

My thinking is, in .NET during compare, "\u00D8", is being treated as ASCII
"O" and not the Unicode character that it really is.

Regards,

-- George Aroush
 

-----Original Message-----
From: Torsten Rendelmann [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 13, 2006 2:15 AM
To: [email protected];
[email protected]
Subject: RE: Sort differences between .NET and Java in Lucene.Net 2.0

Hi George,

CLR always handles "string" as Unicode, but comparison code like "a" == "b"
will always take the current system culture to compare. So it is even better
to use
String.Compare() instead, there you have all at hand what influence the
result:
the used Comparer, Culture, case-sensitivity etc.

I tested a little bit with CLR 2.0 (but String.Compare() calls are similar
in CLR 1.1/1.0), here is the code and the results as comments:

[TestMethod]
public void TestMethod1()
{
        string one = "HOT";
        string two = "H\u00D8T";

        int res = String.Compare(one, two); // -1
        Debug.WriteLine(String.Format("String.Compare(one, two): {0}",
res));
        res = String.CompareOrdinal(one, two); // -137
        Debug.WriteLine(String.Format("String.CompareOrdinal(one, two):
{0}", res));
        res = String.Compare(one, two,
StringComparison.InvariantCulture); // -1
        Debug.WriteLine(String.Format("String.Compare(one, two,
StringComparison.InvariantCulture): {0}", res));
        res = String.Compare(one, two, true,
CultureInfo.CreateSpecificCulture("en-US")); // -1
        Debug.WriteLine(String.Format("String.Compare(one, two, true,
CultureInfo.CreateSpecificCulture('en-US')): {0}", res));
        res = String.Compare(one, two, false,
CultureInfo.CreateSpecificCulture("en-US")); // -1
        Debug.WriteLine(String.Format("String.Compare(one, two, false,
CultureInfo.CreateSpecificCulture('en-US')): {0}", res)); } 

String.Compare() doc:
 result < 0             String one is less than two
 result == 0    String one is equal two
 result > 0             String one is greater than two

Kindly, TorstenR

> -----Original Message-----
> From: George Aroush [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, December 13, 2006 5:46 AM
> To: [email protected];
> [email protected]
> Subject: Sort differences between .NET and Java in Lucene.Net 2.0
> 
> Hi folks,
> 
> One of the remaining issues with Lucene.Net 2.0 is two tests that are 
> failing, TestInternationalMultiSearcherSort and TestInternationalSort.
> 
> After few hours of debugging, I discovered that in C#, "H\u00D8T" < 
> "HUT"
> but in Java, "H\u00D8T" > "HUT" (here, "H\u00D8T" is in Unicode and is 
> actually "Ø")
> 
> The culture-info / local used are, in C# "en-US" and in Java 
> "Locale.US".
> 
> The fail point occurs because, I think, 
> System.Globalization.CompareInfo is not treating the string as 
> Unicode; "\u00D8" is being treated as ASCII "O".
> If that's the case, how do I tell .NET to use Unicode?
> 
> IF you know why .NET is behaving differently here, please let me know.
> 
> Regards,
> 
> -- George Aroush
>

RE: Sort differences between .NET and Java in Lucene.Net 2.0

Reply via email to