I'm hoping Xueming will review as well. + char asciiCheck; + for (asciiCheck = 0, p = str; *p != '\0'; p++) { + asciiCheck |= *p; + } + len = (p - str);
Probably conversion from ptrdiff_t to int needs a cast. Someday we might need to worry about input string longer than 2^31, but that's a pre-existing problem. The signed-ness of char being implementation-defined makes me nervous. Maybe do all the bit operations using explicit unsigned char, then the final test becomes simply asciiCheck <= 0x80 On Mon, Jun 12, 2017 at 4:36 PM, Claes Redestad <claes.redes...@oracle.com> wrote: > Hi Martin, > > thanks for reviewing! > > On 2017-06-12 22:14, Martin Buchholz wrote: > >> +/* Optimized for char set UTF-8 */ >> >> "charset" is a (poor misnomer!) jargon term, not two words. >> > > I got that from the existing use of "char set" in this file, but will fix > it in all places. > > >> --- >> >> + for (b = str[len]; b != '\0'; len++, b = str[len]) { >> + if (isAscii && b & 0x80) { >> + isAscii = JNI_FALSE; >> + } >> + } >> >> I would write this more like >> >> const signed char *p; >> int isAscii; >> >> for (isAscii = 0, p = (const signed char *) str; *p != '\0'; p++) isAscii >> &= (*p >= 0); >> > > Did you mean for isAscii to be initialized to 1 (true) and then be cleared > to 0 (false) when *p >= 0 is false? > > >> Then length is (p - str) >> > > How about something like this to hoist the second comparison from the loop: > > int len; > char asciiCheck; > for (asciiCheck = 0, p = str; *p != '\0'; p++) { > asciiCheck |= *p; > } > len = (p - str); > > if (asciiCheck & 0x80) { > // ascii fast-path > return newSizedString8859_1(env, str, len); > } > > ... > > --- >> >> + jbyteArray hab = NULL; >> >> I'm having a hard time decoding the name "hab" >> > > Not sure, but my guess is "heap allocated byteArray". > > >> --- >> >> The code below is not UTF-8 specific. Can it be refactored? >> >> + hab = (*env)->NewByteArray(env, len); >> + if (hab != 0) { >> + jclass strClazz = JNU_ClassString(env); >> + CHECK_NULL_RETURN(strClazz, 0); >> + (*env)->SetByteArrayRegion(env, hab, 0, len, (jbyte *)str); >> + result = (*env)->NewObject(env, strClazz, >> + String_init_ID, hab, jnuEncoding); >> + (*env)->DeleteLocalRef(env, hab); >> + return result; >> + } >> > > Yes, probably. The excessive copy-paste here is due to a crash issue I ran > into when > Charset.isSupported was called without proper initialization. It has since > been > resolved, but seems I forgot to revisit this part. > > I also realized I haven't added a test for this method. I'll look into > doing the > refactoring and adding some sanity testing tomorrow. > > >> --- >> >> We probably want to use unicode escapes in out java sources to keep all >> source files strictly ASCII. >> > > You mean in the test? Sure. > > Refreshed the jdk webrev: http://cr.openjdk.java.net/~re > destad/8181147/jdk.05/ - more to come. > > Thanks! > > /Claes > >