On Thu, 21 Sep 2023 22:48:56 GMT, Claes Redestad <[email protected]> wrote:
>> `URLEncoder` currently appends chars that needs encoding into a
>> `java.io.CharArrayWriter`, converts that to a `String`, uses
>> `String::getBytes` to get the encoded bytes and then appends these bytes in
>> a escaped manner to the output stream. This is somewhat inefficient.
>>
>> This PR replaces the `CharArrayWriter` with a reusable `CharBuffer` +
>> `ByteBuffer` pair. This allows us to encode to the output `StringBuilder` in
>> small chunks, with greatly reduced allocation as a result.
>>
>> The exact size of the buffers is an open question, but generally it seems
>> that a tiny buffer wins by virtue of allocating less, and that the per chunk
>> overheads are relatively small.
>
> Claes Redestad has updated the pull request incrementally with one additional
> commit since the last revision:
>
> REPLACE, add comment, add tests verifying that surrogate pairs at boundary
> works.
src/java.base/share/classes/java/net/URLEncoder.java line 250:
> 248: CharsetEncoder ce = charset.newEncoder()
> 249: .onMalformedInput(CodingErrorAction.REPLACE)
> 250: .onUnmappableCharacter(CodingErrorAction.REPLACE);
@dfuch The explicit use of the REPLACE action makes me wonder if the spec
should be clarified at some point as it doesn't seem to be documented.
src/java.base/share/classes/java/net/URLEncoder.java line 296:
> 294: }
> 295:
> 296: private static void flushToStringBuilder(StringBuilder out,
> CharsetEncoder ce, CharBuffer cb, ByteBuffer bb, boolean endOfInput) {
This method would benefit from a comment to summarise the parameters, only
because there are so many.
Also just a formatting nit is that this line is much longer than everything
else in the file so maybe side-by-side diffs a bit difficult to look at.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/15865#discussion_r1333933819
PR Review Comment: https://git.openjdk.org/jdk/pull/15865#discussion_r1333933207