Re: possible problem with JNI GetStringUTFChars

Stuart Marks Mon, 28 Jan 2019 14:14:09 -0800



On 1/26/19 3:19 PM, David Holmes wrote:

On 27/01/2019 3:08 am, Martin Buchholz wrote:

It's a pet peeve that the name GetStringUTFChars is deeply misleading -
there are many "UTF"s, and this encoding is meant for use with the JVM
only.  The documentation should make it clearer that this is NOT the UTF-8
you might expect.


It does!

GetStringUTFChars

const char * GetStringUTFChars(JNIEnv *env, jstring string, jboolean *isCopy);

Returns a pointer to an array of bytes representing the string in modified UTF-8encoding.

This is pretty easy to miss, especially if you're not aware that the JVM and theJDK have this special concept of "modified UTF-8". Perhaps emphasis should beadded. Or maybe occurrences of "modified UTF-8" should be changed to be links tothe section in chapter 3 of the JNI spec where "modified UTF-8" is defined.(Making the occurrences be links might be emphasis enough.)

I think it would be far too troublesome to try to migrate the JNI methods toprocess real UTF-8 instead of modified UTF-8. That raises the question, though:is there a use case for processing real UTF-8 within JNI? For example, forinteroperating with external components that expect real UTF-8. If so, perhapssome conversion methods could be added.

(From Java code, the Charset encoders/decoders handle real UTF-8, which seems tocover most cases. Modified UTF-8 occurs only within serialization andData{Input,Output}Stream.)



Alan Snyder wrote:

-16 -97 -115 -69


I'll drink to that!

s'marks

Re: possible problem with JNI GetStringUTFChars

Reply via email to