Hi Don,

When evaluating new APIs I always find it helpful and educational to look for cases in real code where they might be applied, and what effect the API has at the call site. The search need not be exhaustive, but it's probably sufficient to find a number of representative examples. This does take some effort, though. For now I'll take a look at some examples where your first item can be applied:

1. Stream contents into a mutable collection created by a Supplier.
default <R extends Collection<T>> R toCollection(Supplier<R> supplier)
{
    return this.collect(Collectors.toCollection(supplier));
}

Usage Examples:

HashSet<String> set = stream.toCollection(HashSet::new);
TreeSet<String> sortedSet = stream.toCollection(TreeSet::new);
ArrayDeque<String> deque = stream.toCollection(ArrayDeque::new);

Since I have the JDK handy I searched it for 'toCollection('. There are around 60 hits -- but note that 2/3 of these are in java.sql.rowset and refer to an unrelated API. Some are in comments, and some are the implementation of Collectors.toCollection itself, which leaves just 14 cases. Let's look at a few.


# src/jdk.compiler/share/classes/com/sun/tools/javac/api/JavacTaskPool.java:164


        List<String> opts =
                StreamSupport.stream(options.spliterator(), false)
                             .collect(Collectors.toCollection(ArrayList::new));

Using the proposed API here would result in:

        List<String> opts =
                StreamSupport.stream(options.spliterator(), false)
                             .toCollection(ArrayList::new));

This makes the code a little bit nicer. A static import would also haved helped, though not quite as much as the new API:

        List<String> opts =
                StreamSupport.stream(options.spliterator(), false)
                             .collect(toCollection(ArrayList::new)));

I also note that after some analysis of the usage of the resulting List, it's never modified -- indeed, it's used as the key of a Map -- so this could be replaced with the recently-added Stream::toList.


# src/jdk.compiler/share/classes/com/sun/tools/javac/main/Option.java:381


Set<String> platforms = StreamSupport.stream(providers.spliterator(), false) .flatMap(provider -> StreamSupport.stream(provider.getSupportedPlatformNames()

                .spliterator(),

        false))

.collect(Collectors.toCollection(LinkedHashSet :: new));

(Sorry for line wrapping. This file has some long lines.) Again, using the proposal API would shorten things a bit, but it doesn't really make much difference within the overall context:

Set<String> platforms = StreamSupport.stream(providers.spliterator(), false) .flatMap(provider -> StreamSupport.stream(provider.getSupportedPlatformNames()

                .spliterator(),

        false))
                                                 .toCollection(LinkedHashSet :: 
new));


# src/java.logging/share/classes/java/util/logging/LogManager.java:2138


            final Map<String, TreeSet<String>> loggerConfigs =

allKeys.collect(Collectors.groupingBy(ConfigProperty::getLoggerName,
                                    TreeMap::new,
                                    Collectors.toCollection(TreeSet::new)));

This is an interesting case, as the toCollection() method is being used to produce a "downstream" collector passed to groupingBy(). Since the proposed API is on stream, it can't be used here. There are a few other cases like this where toCollection is used as a downstream collector, not as an argument to Stream::collect.


# src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/formats/html/HtmlConfiguration.java:377


    public List<DocPath> getAdditionalStylesheets() {
        return options.additionalStylesheets().stream()
                .map(ssf -> DocFile.createFileForInput(this, ssf))
                .map(file -> DocPath.create(file.getName()))
                .collect(Collectors.toCollection(ArrayList::new));
    }

This is another place where the proposed API can be used straightforwardly:

    public List<DocPath> getAdditionalStylesheets() {
        return options.additionalStylesheets().stream()
                .map(ssf -> DocFile.createFileForInput(this, ssf))
                .map(file -> DocPath.create(file.getName()))
                .toCollection(ArrayList::new));
    }


# src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/toolkit/util/IndexBuilder.java:220


This is a slightly different case, as it uses a lambda to pass a comparator to a constructor instead of using a constructor reference. Before:

    public SortedSet<IndexItem> getItems(DocTree.Kind kind) {
        Objects.requireNonNull(kind);
return itemsByCategory.getOrDefault(IndexItem.Category.TAGS, Collections.emptySortedSet()).stream()
                .filter(i -> i.isKind(kind))
                .collect(Collectors.toCollection(() -> new 
TreeSet<>(mainComparator)));
    }

After:

    public SortedSet<IndexItem> getItems(DocTree.Kind kind) {
        Objects.requireNonNull(kind);
return itemsByCategory.getOrDefault(IndexItem.Category.TAGS, Collections.emptySortedSet()).stream()
                .filter(i -> i.isKind(kind))
                .toCollection(() -> new TreeSet<>(mainComparator)));
    }


*****


There are a few other cases in the JDK but they don't seem to lead to any new 
insights.

Some observations.

- Using this API saves 19 characters compared to Collectors::toCollection, but it saves only eight characters compared to Collectors::toCollection with a static import.

- Using this API doesn't relieve the calling code of any burden of tedious or error-prone code. The code it replaces is quite straightforward.

- There don't appear to be any opportunities for optimization. In order to handle the parallel case, this pretty much is required to delegate to the collector. I suppose the serial case could be handled specially, but it boils down to constructing the destination and then calling add() on it repeatedly, which is pretty much what the collector ends up doing in the serial case anyway.

- Cases seem to occur quite infrequently compared to others such as Collectors::toList and Stream::toList.

- Some cases of Collectors::toCollection are used as "downstream" collectors, that is, passed to other collectors instead of Stream::collect. This narrows the range of possible uses of the API still further.

 - There is a recurring pattern

     Collectors.toCollection(ArrayList::new)

This is useful in place of Collectors::toList for places where the code wants to guarantee the result to be an ArrayList. (Even though Collectors::toList returns an ArrayList, it isn't specified to do so.) But there are cases (such as the one I looked at above) where the return list isn't actually modified -- and indeed it would be an error if it were modified -- so Stream::toList could be used just as well for those.

- The JDK is not necessarily a representative code base, but frequencies here do seem to mirror what I've seen in open source: Collectors::toCollection is much less frequent than Collectors::toList.

- There doesn't appear to be any semantic difference between the proposed Stream::toCollection and the existing Collectors::toCollection.

Based on these observations I'm having a hard time mustering much enthusiasm for this API.

You might ask, hasn't the JDK added other convenience APIs? There have probably been a few one-liners, but we are really trying to keep them to a minimum. Mainly, convenience APIs are indeed convenient, but in many cases they add a lot of value in other ways as well. Here are some examples.

- Stream::toList. We discussed this recently, so I won't restate everything. Briefly, though, this can be used as a replacement for Collectors::toList, which is used VERY frequently, it provides stronger semantics, and it's faster because it avoids extra array allocation and copying.

- String::repeat. Repeating a String is a simple for-loop. However, if you look at the implementation [1] there really is a lot going on here. It's a lot faster than the straightforward code, because it peels off a few special cases, it uses a clever doubling algorithm to call arraycopy a minimal number of times, and it deals with things at the byte level, so it can create a String without any copying or any codeset conversion. In addition to convenience, the value here is that it's much faster than a simple for-loop that one might write instead. It also leverages the JDK-internal String representation, which means that it does less allocation and copying than other utility libraries would.

[1] https://github.com/openjdk/jdk16/blob/master/src/java.base/share/classes/java/lang/String.java#L3560

- InputStream::transferTo. This is mostly a straightforward loop [2], but the details are devilishly hard to get right. If you look at user code that does copying like this, it usually gets some edge case wrong, for example, not handling partial reads. The value provided here is not that it's faster than the code it would replace, but that it relieves the caller of the responsibility of writing a loop that's easy to get wrong (or to form a dependency on another library that has this utility method).

[2] https://github.com/openjdk/jdk16/blob/master/src/java.base/share/classes/java/io/InputStream.java#L777

Anyway, this is the sort of analysis and justification that I'd like to see for convenience APIs. Such APIs need to be more than just a shorthand for something a bit longer.

s'marks

Reply via email to