Hi Paul,

On 06/11/18 19:10, Paul Sandoz wrote:
Hi Peter,

I like it and can see it being useful, thanks for sharing.

I am hesitating a little about it being in the JDK because there is the larger 
abstraction of a BiStream, where a similar form of collection would naturally 
fit (but perhaps without the intersection constraints for the 
characteristics?). We experimented a few times with BiStream and got quite far 
but decided pull back due to the lack of value types and specialized generics. 
So i dunno how this might turn out in the future and if your BiCollector fits 
nicely into such a future model.

What are you thoughts on this?

Well, I don't see the need to pack the two results into a Map.Entry (or any similar) container as a drawback. It's not a performance drawback for sure, because this is not happening on the stream-element scale, but on the final result or intermediate accumulation results scale (the later only in parallel non-CONCURRENT scenario). In non-parallel scenario, only a single (or two for non-IDENTITY_FINISH) Map.Entry objects are created.

I also don't see a larger abstraction like BiStream as a natural fit for a similar thing. As I understand BiStream attempts (maybe I haven't seen the right ones?), they are more about passing pairs of elements down the pipeline. BiCollector OTOH is "splitting" the single element pipeline at the final collection stage, with the purpose of constructing two independent collections from a single pass of the original single-element Stream. This is more about single pass than anything else. And single pass, I think, is inevitably such that it can only execute a single collection strategy (CONCURRENT vs. not CONCURRENT), regardless of the type of the stream (Stream vs. BiStream). Or have you prototyped a combined strategy in BiStream?

Regards, Peter

FWIW i would call it a “splitting” or “bisecting" collector e.g. 
“s.collect(bisecting(…))”

Paul.




On Jun 11, 2018, at 5:39 AM, Peter Levart <peter.lev...@gmail.com> wrote:

Hi,

Have you ever wanted to perform a collection of the same Stream into two different 
targets using two Collectors? Say you wanted to collect Map.Entry elements into two 
parallel lists, each of them containing keys and values respectively. Or you wanted 
to collect elements into  groups by some key, but also count them at the same time? 
Currently this is not possible to do with a single Stream. You have to create two 
identical streams, so you end up passing Supplier<Stream> to other methods 
instead of bare Stream.

I created a little utility Collector implementation that serves the purpose 
quite well:

/**
  * A {@link Collector} implementation taking two delegate Collector(s) and 
producing result composed
  * of two results produced by delegating collectors, wrapped in {@link 
Map.Entry} object.
  *
  * @param <T> the type of elements collected
  * @param <K> the type of 1st delegate collector collected result
  * @param <V> tye type of 2nd delegate collector collected result
  */
public class BiCollector<T, K, V> implements Collector<T, Map.Entry<Object, Object>, 
Map.Entry<K, V>> {
     private final Collector<T, Object, K> keyCollector;
     private final Collector<T, Object, V> valCollector;

     @SuppressWarnings("unchecked")
     public BiCollector(Collector<T, ?, K> keyCollector, Collector<T, ?, V> 
valCollector) {
         this.keyCollector = (Collector) Objects.requireNonNull(keyCollector);
         this.valCollector = (Collector) Objects.requireNonNull(valCollector);
     }

     @Override
     public Supplier<Map.Entry<Object, Object>> supplier() {
         Supplier<Object> keySupplier = keyCollector.supplier();
         Supplier<Object> valSupplier = valCollector.supplier();
         return () -> new AbstractMap.SimpleImmutableEntry<>(keySupplier.get(), 
valSupplier.get());
     }

     @Override
     public BiConsumer<Map.Entry<Object, Object>, T> accumulator() {
         BiConsumer<Object, T> keyAccumulator = keyCollector.accumulator();
         BiConsumer<Object, T> valAccumulator = valCollector.accumulator();
         return (accumulation, t) -> {
             keyAccumulator.accept(accumulation.getKey(), t);
             valAccumulator.accept(accumulation.getValue(), t);
         };
     }

     @Override
     public BinaryOperator<Map.Entry<Object, Object>> combiner() {
         BinaryOperator<Object> keyCombiner = keyCollector.combiner();
         BinaryOperator<Object> valCombiner = valCollector.combiner();
         return (accumulation1, accumulation2) -> new 
AbstractMap.SimpleImmutableEntry<>(
             keyCombiner.apply(accumulation1.getKey(), accumulation2.getKey()),
             valCombiner.apply(accumulation1.getValue(), 
accumulation2.getValue())
         );
     }

     @Override
     public Function<Map.Entry<Object, Object>, Map.Entry<K, V>> finisher() {
         Function<Object, K> keyFinisher = keyCollector.finisher();
         Function<Object, V> valFinisher = valCollector.finisher();
         return accumulation -> new AbstractMap.SimpleImmutableEntry<>(
             keyFinisher.apply(accumulation.getKey()),
             valFinisher.apply(accumulation.getValue())
         );
     }

     @Override
     public Set<Characteristics> characteristics() {
         EnumSet<Characteristics> intersection = 
EnumSet.copyOf(keyCollector.characteristics());
         intersection.retainAll(valCollector.characteristics());
         return intersection;
     }
}


Do you think this class is general enough to be part of standard Collectors 
repertoire?

For example, accessed via factory method Collectors.toBoth(Collector coll1, 
Collector coll2), bi-collection could then be coded simply as:

         Map<String, Integer> map = ...

         Map.Entry<List<String>, List<Integer>> keys_values =
             map.entrySet()
                .stream()
                .collect(
                    toBoth(
                        mapping(Map.Entry::getKey, toList()),
                        mapping(Map.Entry::getValue, toList())
                    )
                );


         Map.Entry<Map<Integer, Long>, Long> histogram_count =
             ThreadLocalRandom
                 .current()
                 .ints(100, 0, 10)
                 .boxed()
                 .collect(
                     toBoth(
                         groupingBy(Function.identity(), counting()),
                         counting()
                     )
                 );


Regards, Peter


Reply via email to