Tom,

Welcome back:-) Have you seen that cool \x{h...h}? oh, you saw it:-)

Yes, It might be desirable to have a corresponding getCodePointFromName(String name), at least I will need that when I do \N{unicode_name} in regex, but I'm not sure if it is worth to make it a method into j.l.Character or just keep it as an implementation details in j.u.regex. I believe it's cool/fun and went ahead to put j.l.Character.getName() into jdk7, but I'm sure someone might not be convinced that whether or not this method is really useful enough for "normal" developer. I'm also a little worried that the Unicode Standard keeps adding new characters each/every release, so the data file will get bigger and bigger. I managed to have the name data file around 108k for 6.0, I hope the
7.0 is not going to be too big.

I will go through you other emails and file corresponding CR later, and see if we can get in several easy doc re-word/typo changes. It might be too late for JDK7 for those that might be categorized as "API change". I'm still struggling with a nasty race condition bug in my zip area, which I hope and need to close before next Monday, so forgive me if I suddenly go silence for couple days.

-Sherman


On 04-14-2011 7:51 PM, Tom Christiansen wrote:
Sherman,

The other code thing that I saw, but also of course did not fix given
where you are in the release cycle, was another of these mysterious
non-parallel things.  You have a

    String getName(int codePoint)

function (well, static method) which takes a code point (like U+0130) and
produces a string ("LATIN CAPITAL LETTER I WITH DOT ABOVE").  But you have
no corresponding inverse function!  You have a get name from char but no
get char from name.

Have I maybe missed something?  Is it somewhere I didn't notice?

This is important so that people stop having to put ugly magic
numbers in their source code.  Which do you prefer, eh? :)

     int leftQ  = 0x2039;
     int rightQ = 0x203A;

vs:

     int leftQ  = getCharFromName("SINGLE LEFT-POINTING ANGLE QUOTATION MARK");
     int rightQ = getCharFromName("SINGLE RIGHT-POINTING ANGLE QUOTATION MARK");

See

     
http://icu-project.org/apiref/icu4j/com/ibm/icu/lang/UCharacter.html#getCharFromName(java.lang.String)

which has these nifty paired, parallel functions so you can go both ways:

     static int getCharFromName(java.lang.String name)
           Finds a Unicode code point by its most current Unicode
           name and return its code point value.
     static int getCharFromName1_0(java.lang.String name)
           Find a Unicode character by its version 1.0 Unicode name and
           return its code point value.
     static int getCharFromNameAlias(java.lang.String name)
           Find a Unicode character by its corrected name alias and
           return its code point value.
     static java.lang.String getName(int ch)
           Returns the most current Unicode name of the argument code
           point, or null if the character is unassigned or outside the
           range UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not
           have a name.
     static java.lang.String getName(java.lang.String s, java.lang.String 
separator)
           Returns the names for each of the characters in a string
     static java.lang.String getNameAlias(int ch)
           Returns the corrected name from NameAliases.txt if there is one.

Maybe this is something you might be able to consider for JDK8.
It's not really a bug like the other things, but it sure would
make sense to have, and be a great convenience.

Thanks a lot!

--tom

Reply via email to