Hi Don,

On 2021-08-17 19:05, Donald Raab wrote:
Hi Claes,

so the win comes from forEach on the synchronized collection implicitly
being synchronized atomically, whereas foreach will synchronize on
each iterator operation and thus allow racy modification of elements
in between iteration, right?

Yes, this win will be applied to any type that does something special in 
forEach. For instance we have MultiReader collections in Eclipse Collections, 
which take a read lock on forEach, and throws UnsupportedOperationException on 
iterator. This change would make String.join safe to use for these as well, 
without requiring an explicit lock in client code.

Thanks, I think this will help inform the decision-making.



Seems reasonable to consider, though it'd have to percolate through to most 
production deployments before you could use it - and even then
it seems somewhat fragile. And isn't this just one of many utility
methods that would have to be fixed to avoid issues with hidden
iterators completely?


Yes, this is one small step. Synchronized collections wouldn’t become less 
tricky to use, but there would be one less hidden iterator to worry about.

Just thinking in terms of whether this ought to be a concerted effort.
And perhaps some paranoid fear it might spiral out of control unless
we're wary. :-)

Fixing the OpenJDK implementation is one thing, but I think this ought
to be also considered on a specification (and JCK) level to avoid
creating a dependency on the implementation.



String.join was recently changed to this by JDK-8265237[1]:

        for (CharSequence cs: elements) {
            if (size >= elems.length) {
                elems = Arrays.copyOf(elems, elems.length << 1);
            }
            elems[size++] = String.valueOf(cs);
        }


This looks like the code for the overloaded String.join which takes an array. I 
was referring to only the version that takes Iterable.

I'm looking at the Iterable-based implementation in openjdk/jdk:
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L3320


Of course the body in the new code could be extracted to the same
effect. The drawback in either implementation is that it would by
necessity be a capturing lambda, so might add a small allocation
overhead. I'm biased towards avoiding performance overheads, but maybe
someone else can make a call on whether this is a worthwhile thing to
attempt.


There is a trade-off here. You remove the need to create an iterator for all 
types (due to foreach loop), and replace with forEach with a method reference, 
which IIRC might be slightly better than having a lambda. The underlying 
collection (let’s say ArrayList) will be called which will also not create an 
iterator but iterate directly over the ArrayList internal array.

Method references and lambdas should boil down to the same thing at
runtime. The differentiator for performance is when you need to capture
some state, in which the lambda won't be a singleton but instead each
invocation will allocate an object that captures the state. I believe
this would happen with joiner::add pre-JDK-8265237 (the method ref will
capture joiner).

But you're right that for many (most?) collections then forEach is
specialized and we could become more or less performance neutral. I've
prototyped a solution using a local Consumer class, along with a
microbenchmark, here:

https://github.com/openjdk/jdk/pull/5152/files

Both baseline and prototype allocates about as much (ArrayList-based),
while the prototype actually comes out slightly faster:

Benchmark (count) (length) (mode) Mode Cnt Score Error Units StringJoinerBenchmark.joinIterable 5 1 latin avgt 10 362.882 ± 6.789 ns/op # 18-b09 baseline StringJoinerBenchmark.joinIterable 5 1 latin avgt 10 394.977 ± 9.448 ns/op # prototype

Collections without a forEach specialization might suffer since the default method will go via the iterator, but that should be acceptable
from a performance point of view.

Thanks,
Claes


Thanks,
Don

Thanks!

/Claes

[1] https://bugs.openjdk.java.net/browse/JDK-8265237

On 2021-08-17 06:55, Donald Raab wrote:
The following code:
for (CharSequence cs: elements) {
     joiner.add(cs);
}
Can be replaced with:
elements.forEach(joiner::add);
This would have the minor benefit of making it safe to use String.join with 
synchronized collections without requiring a client side lock. There are likely 
other opportunities like this in the JDK.
Also, with the addition of forEach on Iterable, and Predicate in Java 8, the 
following design FAQ is outdated.
https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/util/doc-files/coll-designfaq.html#a6
 
<https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/util/doc-files/coll-designfaq.html#a6>
Thanks,
Don

Reply via email to