On Wed, Aug 26, 2009 at 02:59, Ulf Zibis<[email protected]> wrote: > Hi all, > > I have put all important things of sun.nio.cs.Surrogate to java.* packages, > I guess. We likely could get rid of it.
I am in principle of adding the method isBMP to Character.java (I did write it!) but I was hoping we can find a better name for it... Hmmmmm..... The Unicode Glossary uses BMP consistently, so maybe isBMP is the best we can come up with? How about isBMPCodePoint? http://unicode.org/glossary/#BMP_code_point Yeah, I think that's not too bad. Martin > See: https://bugs.openjdk.java.net/show_bug.cgi?id=100104 > > > Some comments: > ================ > public static int toChars(int codePoint, char[] dst, int dstIndex) { > - if (codePoint < 0 || codePoint > MAX_CODE_POINT) { > - throw new IllegalArgumentException(); > - } > - if (codePoint < MIN_SUPPLEMENTARY_CODE_POINT) { > - dst[dstIndex] = (char) codePoint; > - return 1; > - } > - toSurrogates(codePoint, dst, dstIndex); > + if (codePoint >= MIN_CODE_POINT) { > + if (codePoint < MIN_SUPPLEMENTARY_CODE_POINT) { > + dst[dstIndex] = (char)codePoint; > + return 1; > + } > + if (codePoint <= MAX_CODE_POINT) > + return toSurrogates(codePoint, dst, dstIndex); > + } > + throw new IllegalArgumentException(); > + } > ---------------- > saves 1 compare in case of code point is in BMP Yes. A good change. > ================ > + public static char[] toChars(int codePoint) { > + if (codePoint >= MIN_CODE_POINT) { > + char[] result; > + if (codePoint < MIN_SUPPLEMENTARY_CODE_POINT) { > + result = C1.clone(); > ---------------- > I guess cloning is faster than new instantiation, as fresh one has to be > initialized by 0s, but not sure Cloning has to initialize as well, with the original contents. Often there is special support for zeroing newly allocated objects. So I think this is a bad idea. > ================ > + static int toSurrogates(int codePoint, char[] dst, int index) { > + // We write elements "backwards" to guarantee all-or-nothing // > TODO: isn't forward faster ? > + dst[index+1] = lowSurrogate(codePoint); > + dst[index] = highSurrogate(codePoint); > ---------------- > IMO this guarantee is nowhere needed, so why potentially waste performance ? All-or-nothing is a fundamental software reliability principle. Why have your data structure in a corrupted state if it is not too onerous to avoid it? The difference in behavior is certainly user-visible. > ================ > - // Pass 2: Allocate and fill in char[] > - char[] v = new char[n]; > - for (int i = offset, j = 0; i < offset + count; i++) { > - int c = codePoints[i]; > - if (c < Character.MIN_SUPPLEMENTARY_CODE_POINT) { > - v[j++] = (char) c; > - } else { > - Character.toSurrogates(c, v, j); > - j += 2; > - } > - } > - > - this.value = v; > + // Pass 2: Allocate and fill in char[] > + for (int i = offset, j = 0; i < offset + count; i++) { > + int c = codePoints[i]; > + if (c < Character.MIN_SUPPLEMENTARY_CODE_POINT) > + value[j++] = (char)c; > + else > + j += Character.toSurrogates(c, value, j); > } > ---------------- > I guess HotSpot is aware of immutability of String, so no need to locally > buffer v expletive to save indirection and against concurrent modifications. I like the simplification of this method a lot, but it actually helps hotspot a little to hold the new array in a local and assign it to a field at the end of the constructor. This sort of thing (copying a field into a local) is done in java.util.concurrent a lot. --- It would help to separate cosmetic and "real" changes to get them reviewed and accepted. Martin
