Re: possible problem with JNI GetStringUTFChars

Alan Snyder Fri, 25 Jan 2019 09:32:02 -0800

My question was not about why it does what it does, but why it still does that. 
Is there a valid use of this primitive that depends upon it returning something 
other than true UTF-8?


It may not have been an issue to you, but it was to me when I discovered my 
program could not handle certain file names. I’ll bet I’m not the last person 
to assume that a primitive named GetStringUTFChars returns UTF.

I have fixed my code, so its not an issue for me any more, but it seems like an 
unnecessary tarpit awaiting the unwary.

Just my 2c.

  Alan


> On Jan 24, 2019, at 10:04 PM, David Holmes <[email protected]> wrote:
> 
> On 25/01/2019 4:39 am, Alan Snyder wrote:
>> Thank you. That post does explain what is happening, but leaves open the 
>> question of whether GetStringUTFChars should be changed.
>> What is the value of the current implementation of GetStringUTFChars versus 
>> one that returns true UTF-8?
> 
> Well that's really a Hotspot question as it concerns JNI, but this is ancient 
> history. There's little point musing over the "why" of decisions made back in 
> the late 1990's. But I suspect the main reason is the avoidance of embedded 
> NUL characters.
> 
> The only bug report I can see on this (basically the same issue you are 
> reporting) was back in 2004:
> 
> https://bugs.openjdk.java.net/browse/JDK-5030776
> 
> so it simply has not been an issue. As per the SO article that Claes 
> referenced anyone needing true UTF8 has a couple of paths to achieve that.
> 
> Cheers,
> David
> -----
> 
> 
>>   Alan
>>> On Jan 24, 2019, at 10:32 AM, Claes Redestad <[email protected]> 
>>> wrote:
>>> 
>>> Hi Alan,
>>> 
>>> GetStringUTFChars unfortunately doesn't give you true UTF-8, but a modified 
>>> UTF-8 sequence
>>> as used by the VM internally for historical reasons.
>>> 
>>> See answers to this related question on SO (which contains links to 
>>> official docs):
>>> https://stackoverflow.com/questions/32205446/getting-true-utf-8-characters-in-java-jni
>>> 
>>> HTH
>>> 
>>> /Claes
>>> 
>>> On 2019-01-24 19:23, Alan Snyder wrote:
>>>> I am having a problem with file names that contain emojis when passed to a 
>>>> macOS system call.
>>>> 
>>>> Things work when I convert the path to bytes in Java, but fail (file not 
>>>> found) when I convert the path to bytes in native code using 
>>>> GetStringUTFChars.
>>>> 
>>>> For example, where String.getBytes() returns
>>>> 
>>>> -16 -97 -115 -69
>>>> 
>>>> GetStringUTFChars returns:
>>>> 
>>>> -19 -96 -68 -19 -67 -69
>>>> 
>>>> I’m not a UTF expert, so can someone say whether I should file a bug 
>>>> report?
>>>> 
>>>> (Tested in JDK 9, 11, and a fairly recent 12)
>>>> 
>>> 
>

Re: possible problem with JNI GetStringUTFChars

Reply via email to