Re: java.lang.Character lacuna #2 of 2

Xueming Shen Thu, 14 Apr 2011 20:30:42 -0700

Tom,

Welcome back:-) Have you seen that cool \x{h...h}? oh, you saw it:-)

Yes, It might be desirable to have a correspondinggetCodePointFromName(String name), at leastI will need that when I do \N{unicode_name} in regex, but I'm not sureif it is worth to make it a methodinto j.l.Character or just keep it as an implementation details inj.u.regex. I believe it's cool/fun andwent ahead to put j.l.Character.getName() into jdk7, but I'm suresomeone might not be convincedthat whether or not this method is really useful enough for "normal"developer. I'm also a littleworried that the Unicode Standard keeps adding new characters each/everyrelease, so the datafile will get bigger and bigger. I managed to have the name data filearound 108k for 6.0, I hope the

7.0 is not going to be too big.

I will go through you other emails and file corresponding CR later, andsee if we can get in severaleasy doc re-word/typo changes. It might be too late for JDK7 for thosethat might be categorized as"API change". I'm still struggling with a nasty race condition bug in myzip area, which I hope and needto close before next Monday, so forgive me if I suddenly go silence forcouple days.


-Sherman


On 04-14-2011 7:51 PM, Tom Christiansen wrote:

Sherman,

The other code thing that I saw, but also of course did not fix given
where you are in the release cycle, was another of these mysterious
non-parallel things.  You have a

    String getName(int codePoint)

function (well, static method) which takes a code point (like U+0130) and
produces a string ("LATIN CAPITAL LETTER I WITH DOT ABOVE").  But you have
no corresponding inverse function!  You have a get name from char but no
get char from name.

Have I maybe missed something?  Is it somewhere I didn't notice?

This is important so that people stop having to put ugly magic
numbers in their source code.  Which do you prefer, eh? :)

     int leftQ  = 0x2039;
     int rightQ = 0x203A;

vs:

     int leftQ  = getCharFromName("SINGLE LEFT-POINTING ANGLE QUOTATION MARK");
     int rightQ = getCharFromName("SINGLE RIGHT-POINTING ANGLE QUOTATION MARK");

See

     
http://icu-project.org/apiref/icu4j/com/ibm/icu/lang/UCharacter.html#getCharFromName(java.lang.String)

which has these nifty paired, parallel functions so you can go both ways:

     static int getCharFromName(java.lang.String name)
           Finds a Unicode code point by its most current Unicode
           name and return its code point value.
     static int getCharFromName1_0(java.lang.String name)
           Find a Unicode character by its version 1.0 Unicode name and
           return its code point value.
     static int getCharFromNameAlias(java.lang.String name)
           Find a Unicode character by its corrected name alias and
           return its code point value.
     static java.lang.String getName(int ch)
           Returns the most current Unicode name of the argument code
           point, or null if the character is unassigned or outside the
           range UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not
           have a name.
     static java.lang.String getName(java.lang.String s, java.lang.String 
separator)
           Returns the names for each of the characters in a string
     static java.lang.String getNameAlias(int ch)
           Returns the corrected name from NameAliases.txt if there is one.

Maybe this is something you might be able to consider for JDK8.
It's not really a bug like the other things, but it sure would
make sense to have, and be a great convenience.

Thanks a lot!

--tom

Re: java.lang.Character lacuna #2 of 2

Reply via email to