My question was not about why it does what it does, but why it still does that. Is there a valid use of this primitive that depends upon it returning something other than true UTF-8?
It may not have been an issue to you, but it was to me when I discovered my program could not handle certain file names. I’ll bet I’m not the last person to assume that a primitive named GetStringUTFChars returns UTF. I have fixed my code, so its not an issue for me any more, but it seems like an unnecessary tarpit awaiting the unwary. Just my 2c. Alan > On Jan 24, 2019, at 10:04 PM, David Holmes <david.hol...@oracle.com> wrote: > > On 25/01/2019 4:39 am, Alan Snyder wrote: >> Thank you. That post does explain what is happening, but leaves open the >> question of whether GetStringUTFChars should be changed. >> What is the value of the current implementation of GetStringUTFChars versus >> one that returns true UTF-8? > > Well that's really a Hotspot question as it concerns JNI, but this is ancient > history. There's little point musing over the "why" of decisions made back in > the late 1990's. But I suspect the main reason is the avoidance of embedded > NUL characters. > > The only bug report I can see on this (basically the same issue you are > reporting) was back in 2004: > > https://bugs.openjdk.java.net/browse/JDK-5030776 > > so it simply has not been an issue. As per the SO article that Claes > referenced anyone needing true UTF8 has a couple of paths to achieve that. > > Cheers, > David > ----- > > >> Alan >>> On Jan 24, 2019, at 10:32 AM, Claes Redestad <claes.redes...@oracle.com> >>> wrote: >>> >>> Hi Alan, >>> >>> GetStringUTFChars unfortunately doesn't give you true UTF-8, but a modified >>> UTF-8 sequence >>> as used by the VM internally for historical reasons. >>> >>> See answers to this related question on SO (which contains links to >>> official docs): >>> https://stackoverflow.com/questions/32205446/getting-true-utf-8-characters-in-java-jni >>> >>> HTH >>> >>> /Claes >>> >>> On 2019-01-24 19:23, Alan Snyder wrote: >>>> I am having a problem with file names that contain emojis when passed to a >>>> macOS system call. >>>> >>>> Things work when I convert the path to bytes in Java, but fail (file not >>>> found) when I convert the path to bytes in native code using >>>> GetStringUTFChars. >>>> >>>> For example, where String.getBytes() returns >>>> >>>> -16 -97 -115 -69 >>>> >>>> GetStringUTFChars returns: >>>> >>>> -19 -96 -68 -19 -67 -69 >>>> >>>> I’m not a UTF expert, so can someone say whether I should file a bug >>>> report? >>>> >>>> (Tested in JDK 9, 11, and a fairly recent 12) >>>> >>> >