Re: [cp-patches] RFC: patch for Unicode scalar value to UTF-16conversion

Chris Burdess Sun, 08 Jan 2006 02:46:10 -0800

Jeroen Frijters wrote:
> > > I think it is better to use the named constants instead of 
> > the values.
> > 
> > Fair enough, but what about the casting: doesn't that make it 
> > less efficient?
> 
> No, it's a no-op (and not even needed in the source).


OK

> > > I also have a mild preference for the bit-shifting and masking
> > > computation, but this is more minor.
> > 
> > The problem is that the bit-shifting method produces the wrong result.
> 
> In this case they both produce the same result. I think the problem was
> that the original code forgot to subtract 0x10000.

I see. So, is there a good reason to do it this way (bitshifting and
bitmasking) rather than the arithmetical methods (which presumably get
compiled to the same thing, but are much more straightforward to
understand)?

Here is an updated patch that includes the corresponding conversion back
to a Unicode scalar (the original code also forgot to add 0x10000).

Should I commit this? Or should it be bitshifting and bitmasking?
-- 
Chris Burdess
  "They that can give up essential liberty to obtain a little safety
  deserve neither liberty nor safety." - Benjamin Franklin

Index: java/lang/Character.java
===================================================================
RCS file: /cvsroot/classpath/classpath/java/lang/Character.java,v
retrieving revision 1.40
diff -u -r1.40 Character.java
--- java/lang/Character.java    17 Sep 2005 21:58:41 -0000      1.40
+++ java/lang/Character.java    8 Jan 2006 10:39:21 -0000
@@ -2410,11 +2410,11 @@
       {
         // Write second char first to cause IndexOutOfBoundsException
         // immediately.
-        dst[dstIndex + 1] = (char) ((codePoint & 0x3ff)
-                                    + (int) MIN_LOW_SURROGATE );
-        dst[dstIndex] = (char) ((codePoint >> 10) + (int) MIN_HIGH_SURROGATE);
+        final int cp2 = codePoint - 0x10000;
+        dst[dstIndex + 1] = (char) ((cp2 % 0x400) + (int) MIN_LOW_SURROGATE);
+        dst[dstIndex] = (char) ((cp2 / 0x400) + (int) MIN_HIGH_SURROGATE);
         result = 2;
-    }
+      }
     else
       {
         dst[dstIndex] = (char) codePoint;
@@ -2523,7 +2523,8 @@
    */
   public static int toCodePoint(char high, char low)
   {
-    return ((high - MIN_HIGH_SURROGATE) << 10) + (low - MIN_LOW_SURROGATE);
+    return ((high - MIN_HIGH_SURROGATE) * 0x400) +
+      (low - MIN_LOW_SURROGATE) + 0x10000;
   }
 
   /**

pgpvBRBqyWdSh.pgp
Description: PGP signature

_______________________________________________
Classpath-patches mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/classpath-patches

Re: [cp-patches] RFC: patch for Unicode scalar value to UTF-16conversion

Reply via email to