RFC: 8357183: Improving efficiency of Writer::append(CharSequence) and Writer::append(CharSequence, int, int) / Sub Task of 8356679: Using CharSequence::getChars internally

Markus KARG Sat, 17 May 2025 08:01:27 -0700

Roger,

thank you for your comments.

Following your advice I have splitted the larger work of JDK-8356679into sub tasks.

I would like to start with a first PR implementing the *foundational*work, i. e. optimizing Writer::append for efficiency (JDK-8357183). Forconvenience, attached below is a copy of the description.


Comments Welcome!

If everybody is fine with this, I would be happy to get some +1 topublish the PR for JDK-8357183!


-Markus

This sub task of JDK-8356679 proposes implementation-only changes tojava.io.Writer. No API is changed. Implementation behavior is slightlychanged w.r.t to self-calls to non-private methods.

The target is to make the Writer::append(CharSequence) andWriter::append(CharSequence, int, int) methods *in the non-String* casebe as efficient as they are already *in the String* case.

This is achived by now handling CharSequence in *the exact same*optimized way as String was specifically handled before. Thisgeneralization of the previously String-specific optimization ispossible since Java 25, thanks to the new CharSequence::getChars(int,int, char[], int) bulk-read method.

Prior to the proposed change, the code was unefficient, as ontop of whatthe String-case did, the text content of other CharSequences (likeStringBuilder and CharBuffer) had to be copied *once or multiply*,before it was finally processed *as* a String.


The changes in detail are:

* Extract the original implementation of Writer::write(String, int, int)to become a newly introduced, internal methodWriter::implWrite(CharSequence, int, int). This allows making use ofthat exact original implementation by other methods, particularlyCharSequence::append, but potentially also sub classes in the samepackage (for subsequent enhancement of specific Writers).* Writer::append(CharSequence) prevents the previously implied creationof an intermediate String representation in the non-String case (whichimplied creating another temporary object on the heap, and duplicatingfull text content), ontop of what the String case did* Writer::append(CharSequence, int, int), in addition to theoptimization of the single-arg case, also prevents the previouslyimplied creation of a sub sequence (which always meant to create anothertemporary object on the heap, and in many cases meant to duplicatepartial text content).* As a benefit, the JavaDocs of these methods can be simplified thanksto the reduced number of self-calls. While this changes the internalbehavior slightly, it is finally clarifying that these JavaDoc sectionswere originally meant as explanatory notes what *the default code* does,but not as as mandatory specs what *subclasses* have to do.

This work is foundational to subsequent enhancements of specificWriters, as the new implWrite(CharSequence, int, int) method is to becalled by them *instead* of the previously called write(String, int,int) method, to allow for the same efficiency optimiziation now found inthe new default code. Due to that, no other Writers are changed in thisfirst step; these Writers will follow in subsequent JBS / PRs.



Am 14.05.2025 um 15:57 schrieb Roger Riggs:

Hi Markus,

Starting out with the common case is a good idea for the first PR.
I much prefer a PR with a single goal and that comes to a conclusionand does not add new features or changes after the PR is submitted.I tend to lose interest in PRs with lots of churn, it means I have tore-review the bulk of it when there is a change and may wait days tolet it settle down before coming back to it.
I tend to think the PR was not really ready to be reviewed if simpleissues and corrections have to be made frequently.Do your own checking for typos and copyrights and simple refactoringbefore opening the PR.
Quality before quantity or speed.
I'm fine with separate Jira issues that clearly delineate a specificscope and goal.
The title of this issue (8356679) doesn't identify the real goal.
It seems to be to improve performance or memory usage, not just to usea new API.
These are my personal opinions about contributions and process.

Regards, Roger


On 5/14/25 6:48 AM, Markus KARG wrote:
Many of the modified classes derive from a common super class andshare one needed common change (which is one of the points which areeasy to see once you see all of those classes in a single PR, buthard to explain in plaint-text pre-PR mailing list threads), so atleast those need to be discussed *together*. But to spare JBS andPRs, I can open the PR with just the first set of changes, and oncewe agree that this set is fine, I can push the next commit *in thesame PR*. Otherwise we would need endless JBS, mailing list threads,and PRs, just to fixe a dozen internal code lines.
Having said that, does the current state of this thread count as"reached common agreement to file a PR" or do I still have to waituntil more people chime in?
-Markus


Am 13.05.2025 um 15:10 schrieb Roger Riggs:
Hi Markus,

A main point was to avoid trying to do everything at once.
The PR comments become hard to follow and intermingled and it takeslonger to get agreement because of the thrash in the PR.
Roger

On 5/13/25 5:05 AM, Markus KARG wrote:
Thank you, Roger.
Actually the method helps in the "toString()" variants, too, as insome places we could *get rid* of "toString()" (which is more workthan "just" a buffer due to the added compression complexity).
In fact, I already took the time to rewrite *all* of them whilewaiting for the approval of this list posting. In *all* cases*less* buffering / copying is needed, and *less* "toString()"conversion (which is a copy under the hood) is needed. So if Iwould be allowed to show the code as a PR, it would be much easierto explain and discuss.
A PR is the best place to discuss "how to code would change". Inthe worst case, let's drop it if we see that it is actually a badthing.
-Markus


Am 12.05.2025 um 20:18 schrieb Roger Riggs:
Hi Markus,

On the surface, its looks constructive.
I suspect that many of these cases will turn into discussionsabout the right/best/better way to buffer the characters.The getChars method only helps when extracting to a char array,many of the current implementations create strings as theintermediary. The advantage of the 1 character at a time techniqueis not needing a (separated allocated) buffer.
Consider taking a few at a time before launching into the whole set.

$.02, Roger

On 5/11/25 2:45 AM, Markus KARG wrote:
Dear Core Libs Team,

I am hereby requesting comments on JDK-8356679.
I would like to invest some time and set up a PR implementingChen Liangs's proposal laid out inhttps://bugs.openjdk.org/browse/JDK-8356679. For yourconvenience, the text of that JBS is copied below. According tothe Developer's Guide I do need to get broad agreement BEFOREfiling a PR. Therefore, I kindly ask everybody to briefly showconsent, so I may file a PR.
Thanks
-Markus


Copy from https://bugs.openjdk.org/browse/JDK-8356679:
Recently OpenJDK adopted the new methodCharSequence::getChars(int, int, char[], int) for inclusion inJava 25. As a bulk reader method, it allows potentially improvedefficiency over the previously available char-by-char readermethod CharSequence::charAt(int).
Chen Liang suggested on March 23rd on the core-lib-dev mailinglist to use the new method within the internal source code ofOpenJDK for the implementation of Appendables (seehttps://mail.openjdk.org/pipermail/core-libs-dev/2025-March/141521.html).The idea behind this is that the implementations might be moreefficient then.
A quick analysis of the OpenJDK source code identified (at least)the following classes which could potentially run more efficientwhen using CharSequence::getChars internally, thanks to bulkreading and / or prevention of internal copies / toString()conversions:
* java.io.Writer
* java.io.StringWriter
* java.io.PrintWriter
* java.io.BufferedWriter
* java.io.CharArrayWriter
* java.io.FileWriter
* java.io.OutputStreamWriter
* sun.nio.cs.StreamEncoder
* java.io.PrintStream
* java.nio.CharBuffer
In the sense of "eat your own dog food", it makes sense toimplement Chen's idea in (at least) those classes. Possibly moreclasses could get identified when taking a deeper look. Besidesthe potential efficiency improvements, it would be a good showcase for the usage of the new API.
The risk of this change should be low, as test coverage exists,and as the intended changes are solely internal to theimplementation. No API will get changed. In some cases theJavaDocs will get slightly adapted where it currently exposes theactual implementation (to not lie in future).

RFC: 8357183: Improving efficiency of Writer::append(CharSequence) and Writer::append(CharSequence, int, int) / Sub Task of 8356679: Using CharSequence::getChars internally

Reply via email to