Re: New convenience methods on Stream

Stuart Marks Tue, 04 May 2021 23:12:48 -0700

Hi Don,

When evaluating new APIs I always find it helpful and educational to look for casesin real code where they might be applied, and what effect the API has at the callsite. The search need not be exhaustive, but it's probably sufficient to find anumber of representative examples. This does take some effort, though. For now I'lltake a look at some examples where your first item can be applied:

1. Stream contents into a mutable collection created by a Supplier.

default <R extends Collection<T>> R toCollection(Supplier<R> supplier)
{
    return this.collect(Collectors.toCollection(supplier));
}

Usage Examples:

HashSet<String> set = stream.toCollection(HashSet::new);
TreeSet<String> sortedSet = stream.toCollection(TreeSet::new);
ArrayDeque<String> deque = stream.toCollection(ArrayDeque::new);

Since I have the JDK handy I searched it for 'toCollection('. There are around 60hits -- but note that 2/3 of these are in java.sql.rowset and refer to an unrelatedAPI. Some are in comments, and some are the implementation ofCollectors.toCollection itself, which leaves just 14 cases. Let's look at a few.



# src/jdk.compiler/share/classes/com/sun/tools/javac/api/JavacTaskPool.java:164


        List<String> opts =
                StreamSupport.stream(options.spliterator(), false)
                             .collect(Collectors.toCollection(ArrayList::new));

Using the proposed API here would result in:

        List<String> opts =
                StreamSupport.stream(options.spliterator(), false)
                             .toCollection(ArrayList::new));

This makes the code a little bit nicer. A static import would also haved helped,though not quite as much as the new API:


        List<String> opts =
                StreamSupport.stream(options.spliterator(), false)
                             .collect(toCollection(ArrayList::new)));

I also note that after some analysis of the usage of the resulting List, it's nevermodified -- indeed, it's used as the key of a Map -- so this could be replaced withthe recently-added Stream::toList.



# src/jdk.compiler/share/classes/com/sun/tools/javac/main/Option.java:381

Set<String> platforms = StreamSupport.stream(providers.spliterator(),false).flatMap(provider ->StreamSupport.stream(provider.getSupportedPlatformNames()


                .spliterator(),

        false))

.collect(Collectors.toCollection(LinkedHashSet :: new));

(Sorry for line wrapping. This file has some long lines.) Again, using the proposalAPI would shorten things a bit, but it doesn't really make much difference withinthe overall context:

Set<String> platforms = StreamSupport.stream(providers.spliterator(),false).flatMap(provider ->StreamSupport.stream(provider.getSupportedPlatformNames()


                .spliterator(),

        false))
                                                 .toCollection(LinkedHashSet :: 
new));


# src/java.logging/share/classes/java/util/logging/LogManager.java:2138


            final Map<String, TreeSet<String>> loggerConfigs =

allKeys.collect(Collectors.groupingBy(ConfigProperty::getLoggerName,
                                    TreeMap::new,
                                    Collectors.toCollection(TreeSet::new)));

This is an interesting case, as the toCollection() method is being used to produce a"downstream" collector passed to groupingBy(). Since the proposed API is on stream,it can't be used here. There are a few other cases like this where toCollection isused as a downstream collector, not as an argument to Stream::collect.

#src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/formats/html/HtmlConfiguration.java:377



    public List<DocPath> getAdditionalStylesheets() {
        return options.additionalStylesheets().stream()
                .map(ssf -> DocFile.createFileForInput(this, ssf))
                .map(file -> DocPath.create(file.getName()))
                .collect(Collectors.toCollection(ArrayList::new));
    }

This is another place where the proposed API can be used straightforwardly:

    public List<DocPath> getAdditionalStylesheets() {
        return options.additionalStylesheets().stream()
                .map(ssf -> DocFile.createFileForInput(this, ssf))
                .map(file -> DocPath.create(file.getName()))
                .toCollection(ArrayList::new));
    }

#src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/toolkit/util/IndexBuilder.java:220

This is a slightly different case, as it uses a lambda to pass a comparator to aconstructor instead of using a constructor reference. Before:


    public SortedSet<IndexItem> getItems(DocTree.Kind kind) {
        Objects.requireNonNull(kind);

return itemsByCategory.getOrDefault(IndexItem.Category.TAGS,Collections.emptySortedSet()).stream()

                .filter(i -> i.isKind(kind))
                .collect(Collectors.toCollection(() -> new 
TreeSet<>(mainComparator)));
    }

After:

    public SortedSet<IndexItem> getItems(DocTree.Kind kind) {
        Objects.requireNonNull(kind);

return itemsByCategory.getOrDefault(IndexItem.Category.TAGS,Collections.emptySortedSet()).stream()

                .filter(i -> i.isKind(kind))
                .toCollection(() -> new TreeSet<>(mainComparator)));
    }


*****


There are a few other cases in the JDK but they don't seem to lead to any new 
insights.

Some observations.

- Using this API saves 19 characters compared to Collectors::toCollection, but itsaves only eight characters compared to Collectors::toCollection with a static import.

- Using this API doesn't relieve the calling code of any burden of tedious orerror-prone code. The code it replaces is quite straightforward.

- There don't appear to be any opportunities for optimization. In order to handlethe parallel case, this pretty much is required to delegate to the collector. Isuppose the serial case could be handled specially, but it boils down toconstructing the destination and then calling add() on it repeatedly, which ispretty much what the collector ends up doing in the serial case anyway.

- Cases seem to occur quite infrequently compared to others such asCollectors::toList and Stream::toList.

- Some cases of Collectors::toCollection are used as "downstream" collectors, thatis, passed to other collectors instead of Stream::collect. This narrows the range ofpossible uses of the API still further.


 - There is a recurring pattern

     Collectors.toCollection(ArrayList::new)

This is useful in place of Collectors::toList for places where the code wants toguarantee the result to be an ArrayList. (Even though Collectors::toList returns anArrayList, it isn't specified to do so.) But there are cases (such as the one Ilooked at above) where the return list isn't actually modified -- and indeed itwould be an error if it were modified -- so Stream::toList could be used just aswell for those.

- The JDK is not necessarily a representative code base, but frequencies here doseem to mirror what I've seen in open source: Collectors::toCollection is much lessfrequent than Collectors::toList.

- There doesn't appear to be any semantic difference between the proposedStream::toCollection and the existing Collectors::toCollection.

Based on these observations I'm having a hard time mustering much enthusiasm forthis API.

You might ask, hasn't the JDK added other convenience APIs? There have probably beena few one-liners, but we are really trying to keep them to a minimum. Mainly,convenience APIs are indeed convenient, but in many cases they add a lot of value inother ways as well. Here are some examples.

- Stream::toList. We discussed this recently, so I won't restate everything.Briefly, though, this can be used as a replacement for Collectors::toList, which isused VERY frequently, it provides stronger semantics, and it's faster because itavoids extra array allocation and copying.

- String::repeat. Repeating a String is a simple for-loop. However, if you look atthe implementation [1] there really is a lot going on here. It's a lot faster thanthe straightforward code, because it peels off a few special cases, it uses a cleverdoubling algorithm to call arraycopy a minimal number of times, and it deals withthings at the byte level, so it can create a String without any copying or anycodeset conversion. In addition to convenience, the value here is that it's muchfaster than a simple for-loop that one might write instead. It also leverages theJDK-internal String representation, which means that it does less allocation andcopying than other utility libraries would.

[1]https://github.com/openjdk/jdk16/blob/master/src/java.base/share/classes/java/lang/String.java#L3560

- InputStream::transferTo. This is mostly a straightforward loop [2], but thedetails are devilishly hard to get right. If you look at user code that does copyinglike this, it usually gets some edge case wrong, for example, not handling partialreads. The value provided here is not that it's faster than the code it wouldreplace, but that it relieves the caller of the responsibility of writing a loopthat's easy to get wrong (or to form a dependency on another library that has thisutility method).

[2]https://github.com/openjdk/jdk16/blob/master/src/java.base/share/classes/java/io/InputStream.java#L777

Anyway, this is the sort of analysis and justification that I'd like to see forconvenience APIs. Such APIs need to be more than just a shorthand for something abit longer.


s'marks

Re: New convenience methods on Stream

Reply via email to