On Sat, 26 Oct 2024 15:48:11 GMT, Markus KARG <d...@openjdk.org> wrote:

> This Pull Request proposes an implementation for 
> [JDK-8343110](https://bugs.openjdk.org/browse/JDK-8343110): Adding the new 
> method `public void getChars(int srcBegin, int srcEnd, char[] dst, int 
> dstBegin)` to the `CharSequence` interface, providing a **bulk-read** 
> facility including a default implementation iterating over `charAt(int)`.
> 
> In addition, this Pull Request proposes to replace the implementation of 
> `Reader.of(CharSequence).read(char[] cbuf, int off, int len)` to invoke 
> `CharSequence.getChars(next, next + n, cbuf, off)` instead of utilizing 
> pattern matching for switch. Also, this PR proposes to implement 
> `CharBuffer.getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)` as 
> an alias for `CharBuffer.get(srcBegin, dst, dstBegin, srcEnd - srcBegin)`.
> 
> To ensure quality...
> * ...the method signature and JavaDocs are adapted from 
> `AbstractStringBuilder.getChars(...)`.
> * ...this PR relies upon the existing tests for `Reader.of(CharSequence)`, as 
> these provide sufficient coverage of all changes introduced by this PR.

Chen,

I do see your point and like to ask everybody to chime in and comment about the 
alternatives we have. In the end, we need to hear more people than just the two 
of us to decide *which* of both alternatives to finally implement:


* Alternative A: `CharSequence::getChars(int, int, char[], int)` - Fast, but as 
risky as `Reader::read(char[], int, int len)`

This alternative provides best possible performance, but rogue implementations 
of `CharSequence::getChars(int, int, char[], int)` might access the passed 
array *beyond* the requested limits. This allows poisoning and spying.

There already are APIs in the JRE which pose that exact risk *since very long 
time*: `Reader::read(char[], int, int len)`, just as one example, exists since 
Java 1.x, so there could exist lots of rogue custom `Reader` implementations 
already. As a consequence, IMHO we can expect Java developers to be very 
familiar with this kind of risk (and to have set up strategies to deal with it) 
*for decades*. As those rogue `Reader`s *already could* exist, developers 
*unaware* of this kind of risk actually are in very big trouble *already*. 
Hence IMHO de-facto we gain *nothing* if we prevent a well-known risk at point 
B that exists since 30 years at point A. *Iff* we really want to get rid of 
that kind of risk, we need to talk about adding array-views/sub-arrays to the 
Java language (which is illusory *in the next years*) or about removing 
existing core APIs (which is illusory *absolutely*).

Developers will tackle this risk at `CharSequence::getChars` just in the very 
same way as they did it for 30 years at `Reader::read`: Either they bear the 
risk, or they go with the code you proposed (intermediate array-copy in 
untrusted cases). The JRE itself even does that internally, see 
https://github.com/openjdk/jdk/commit/b0d145097cdc61e4bab19393a125e63aa3bc29b9 
for example.


* Alternative B: `CharSequence::subsequence(int, int).getChars(char[], int)` - 
Risk-free but possibly very slow

This implementation prevents the risk implied by Alternative A, but for the 
price of an additional (possibly expensive) copy performed *always* (*not just* 
in the untrusted case). While JIT eventually *might* optimize away this costs 
for *some* implementations, it might *not* for interpreted code, non-HotSpot 
JVMs, or I/O-bound implementations. `CharSequence` implementations *are not 
forced to* hold its data in-memory. In the I/O-bound case, imposing 
`subsequence` *might* possibly impose *even worse* performance than a loop over 
`getChar(int)` (assumption in this example: I/O duplication forces I/O 
writes/flushes, but I/O loop-over-read-char will fill a buffer once and read 
from that). As this new API proposal *solely* is about improving performance, 
this actually would be counter-productive.


My personal conclusion is: As the risk exists since 30 years, it will not be a 
problem for developers. Let's pave the way for performance.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21730#issuecomment-2745384387

Reply via email to