On 1/26/19 3:19 PM, David Holmes wrote:
On 27/01/2019 3:08 am, Martin Buchholz wrote:
It's a pet peeve that the name GetStringUTFChars is deeply misleading -
there are many "UTF"s, and this encoding is meant for use with the JVM
only.  The documentation should make it clearer that this is NOT the UTF-8
you might expect.

It does!

GetStringUTFChars

const char * GetStringUTFChars(JNIEnv *env, jstring string, jboolean *isCopy);

Returns a pointer to an array of bytes representing the string in modified UTF-8 encoding.

This is pretty easy to miss, especially if you're not aware that the JVM and the JDK have this special concept of "modified UTF-8". Perhaps emphasis should be added. Or maybe occurrences of "modified UTF-8" should be changed to be links to the section in chapter 3 of the JNI spec where "modified UTF-8" is defined. (Making the occurrences be links might be emphasis enough.)

I think it would be far too troublesome to try to migrate the JNI methods to process real UTF-8 instead of modified UTF-8. That raises the question, though: is there a use case for processing real UTF-8 within JNI? For example, for interoperating with external components that expect real UTF-8. If so, perhaps some conversion methods could be added.

(From Java code, the Charset encoders/decoders handle real UTF-8, which seems to cover most cases. Modified UTF-8 occurs only within serialization and Data{Input,Output}Stream.)


Alan Snyder wrote:
-16 -97 -115 -69

I'll drink to that!

s'marks

Reply via email to