On 1/26/19 3:19 PM, David Holmes wrote:
On 27/01/2019 3:08 am, Martin Buchholz wrote:
It's a pet peeve that the name GetStringUTFChars is deeply misleading -
there are many "UTF"s, and this encoding is meant for use with the JVM
only. The documentation should make it clearer that this is NOT the UTF-8
you might expect.
It does!
GetStringUTFChars
const char * GetStringUTFChars(JNIEnv *env, jstring string, jboolean *isCopy);
Returns a pointer to an array of bytes representing the string in modified UTF-8
encoding.
This is pretty easy to miss, especially if you're not aware that the JVM and the
JDK have this special concept of "modified UTF-8". Perhaps emphasis should be
added. Or maybe occurrences of "modified UTF-8" should be changed to be links to
the section in chapter 3 of the JNI spec where "modified UTF-8" is defined.
(Making the occurrences be links might be emphasis enough.)
I think it would be far too troublesome to try to migrate the JNI methods to
process real UTF-8 instead of modified UTF-8. That raises the question, though:
is there a use case for processing real UTF-8 within JNI? For example, for
interoperating with external components that expect real UTF-8. If so, perhaps
some conversion methods could be added.
(From Java code, the Charset encoders/decoders handle real UTF-8, which seems to
cover most cases. Modified UTF-8 occurs only within serialization and
Data{Input,Output}Stream.)
Alan Snyder wrote:
-16 -97 -115 -69
I'll drink to that!
s'marks