Re: Unexpected behaviour with larger Strings

Andrew Haley Thu, 23 Apr 2020 03:01:14 -0700

On 4/20/20 6:09 PM, Adam Retter wrote:

> I was surprised that by my findings that:
>
> 1. On JDK 7 and 8 with HotSpot - getting the bytes of a UTF-8 string
> where all chars are '0' wants to allocate an array larger than the VM
> limit, whereas the same operation on ASCII and ISO-8859-1 do not. If I
> am not mistaken then the char '0' takes up the same amount of bytes
> (i.e. 1 byte) in ASCII, ISO-8559-1, andUTF-8.


Yes, but the code doing the allocation doesn't know that the bytes
are all zero, and it isn't going to scan to find out.

> Now obviously I am happy to see that this passes on JDK 10+ with
> HotSpot :-) But shouldn't it also pass for J9? Also why is there such
> a variation in the errors, I would have hoped that such simple Java
> code was "portable".

Did you step through the code to see what was happening? Strings were
substantially rewritten between these releases, and one of the
decisions made was to use byte arrays rather than char arrays for the
contents of the string.

Finally, please bear in mind what Strings are intended for. The
UTF-encoded Complete Works of Shakespeare occupy 5.5Mb, and IMVHO any
"string" that can contain the Complete Works is large enough for any
reasonable use.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Re: Unexpected behaviour with larger Strings

Reply via email to