Weird behaviour on char boxing

Juan Orellana Wed, 23 Mar 2016 03:04:20 -0700

I recently downloaded the latest 4.5 from github 
https://github.com/apache/lucenenet/ and started playing around with lucene.
When I ran some of the test y noticed a weird behavior with 
RandomlyRecaseCodePoints method on the TestUtil class “TestUtil.cs”.
The test seems to generate random text and sometimes y got weird behavior with 
some special string that may be invalid strings.


The error seems to on these lines

case 0:
                             builder.Append(char.ToUpper((char)codePoint));
                             break;

case 1:
                             builder.Append(char.ToLower((char)codePoint));
                             break;

case 2: // leave intact
                             builder.Append((char)codePoint);
                             break;

the (char)codePoint seems to truncate the integer codepoint so you get the 
wrong result back and the test fails because the length of the txt is not the 
same.
I don’t get this behavior when y run the same text with the java version of 
Lucene (RandomlyRecaseCodePoints).

I made a quick fix and this code seems to fix the problem but I haven’t tested 
it completely.

var stringValue = char.ConvertFromUtf32(codePoint);

switch (NextInt(random, 0, 2))
{
                             case 0:
                                                          var value0 = 
stringValue.ToUpper();
                                                          
builder.Append(value0);
                                                          break;

                             case 1:
                                                          var value1 = 
stringValue.ToUpper().ToLower();
                                                          
builder.Append(value1);
                                                          break;

                             case 2: // leave intact
                                                          
builder.Append(stringValue);
                                                          break;
}

The text y got when running the test was hex F2 BA 81 B2 20
I made a bin file and added those hex number with a hexeditor was the only way 
to repeatable test the same “incorrect” string.
(I attached the file y used on this mail “failedString.bin”)
Then y read the text File.ReadAllText with Linqpad and tested the 
RandomlyRecaseCodePoints method with the string.

Has anyone else noticed this problem ??

Juan Orellana
System developer

Gustavslundsvägen 12
+46 (0)8 566 229 942
[email protected]

NORDIC NETPRODUCTS AB
Box 14113, SE-167 14 Bromma
+46 (0)8 566 229 00
www.nordicnet.se | www.largestcompanies.se

Weird behaviour on char boxing

Reply via email to