Yep, while we can pass lambdas i guess it is fine or we have to use proxies to hide the mutation but i dont think we need to be that purist to move to a more expressive dsl.
Le 13 mars 2018 19:49, "Ben Chambers" <bjchamb...@gmail.com> a écrit : > The CombineFn API has three types parameters (input, accumulator, and > output) and methods that approximately correspond to those parts of the > collector > > CombineFn.createAccumulator = supplier > CombineFn.addInput = accumulator > CombineFn.mergeAccumlator = combiner > CombineFn.extractOutput = finisher > > That said, the Collector API has some minimal, cosmetic differences, such > as CombineFn.addInput may either mutate the accumulator or return it. The > Collector accumulator method is a BiConsumer, meaning it must modify. > > On Tue, Mar 13, 2018 at 11:39 AM Romain Manni-Bucau <rmannibu...@gmail.com> > wrote: > >> Misses the collect split in 3 (supplier, combiner, aggregator) but >> globally agree. >> >> I d just take java stream, remove "client" method or make them big data >> if possible, ensure all hooks are serializable to avoid hacks and add an >> unwrap to be able to access the pipeline in case we need a very custom >> thing and we are done for me. >> >> Le 13 mars 2018 19:26, "Ben Chambers" <bchamb...@apache.org> a écrit : >> >>> I think the existing rationale (not introducing lots of special fluent >>> methods) makes sense. However, if we look at the Java Stream API, we >>> probably wouldn't need to introduce *a lot* of fluent builders to get most >>> of the functionality. Specifically, if we focus on map, flatMap, and >>> collect from the Stream API, and a few extensions, we get something like: >>> >>> * collection.map(DoFn) for applying aParDo >>> * collection.map(SerialiazableFn) for Java8 lambda shorthand >>> * collection.flatMap(SerialiazbleFn) for Java8 lambda shorthand >>> * collection.collect(CombineFn) for applying a CombineFn >>> * collection.apply(PTransform) for applying a composite transform. note >>> that PTransforms could also use serializable lambdas for definition. >>> >>> (Note that GroupByKey doesn't even show up here -- it could, but that >>> could also be way of wrapping a collector, as in the Java8 >>> Collectors.groupyingBy [1] >>> >>> With this, we could write code like: >>> >>> collection >>> .map(myDoFn) >>> .map((s) -> s.toString()) >>> .collect(new IntegerCombineFn()) >>> .apply(GroupByKey.of()); >>> >>> That said, my two concerns are: >>> (1) having two similar but different Java APIs. If we have more >>> idiomatic way of writing pipelines in Java, we should make that the >>> standard. Otherwise, users will be confused by seeing "Beam" examples >>> written in multiple, incompatible syntaxes. >>> (2) making sure the above is truly idiomatic Java and that it doesn't >>> any conflicts with the cross-language Beam programming model. I don't think >>> it does. We have (I believ) chosen to make the Python and Go SDKs idiomatic >>> for those languages where possible. >>> >>> If this work is focused on making the Java SDK more idiomatic (and thus >>> easier for Java users to learn), it seems like a good thing. We should just >>> make sure it doesn't scope-creep into defining an entirely new DSL or SDK. >>> >>> [1] https://docs.oracle.com/javase/8/docs/api/java/util/ >>> stream/Collectors.html#groupingBy-java.util.function.Function- >>> >>> On Tue, Mar 13, 2018 at 11:06 AM Romain Manni-Bucau < >>> rmannibu...@gmail.com> wrote: >>> >>>> Yep >>>> >>>> I know the rational and it makes sense but it also increases the >>>> entering steps for users and is not that smooth in ides, in particular for >>>> custom code. >>>> >>>> So I really think it makes sense to build an user friendly api on top >>>> of beam core dev one. >>>> >>>> >>>> Le 13 mars 2018 18:35, "Aljoscha Krettek" <aljos...@apache.org> a >>>> écrit : >>>> >>>>> https://beam.apache.org/blog/2016/05/27/where-is-my- >>>>> pcollection-dot-map.html >>>>> >>>>> On 11. Mar 2018, at 22:21, Romain Manni-Bucau <rmannibu...@gmail.com> >>>>> wrote: >>>>> >>>>> >>>>> >>>>> Le 12 mars 2018 00:16, "Reuven Lax" <re...@google.com> a écrit : >>>>> >>>>> I think it would be interesting to see what a Java stream-based API >>>>> would look like. As I mentioned elsewhere, we are not limited to having >>>>> only one API for Beam. >>>>> >>>>> If I remember correctly, a Java stream API was considered for Dataflow >>>>> back at the very beginning. I don't completely remember why it was >>>>> rejected, but I suspect at least part of the reason might have been that >>>>> Java streams were considered too new and untested back then. >>>>> >>>>> >>>>> Coders are broken - typevariables dont have bounds except object - and >>>>> reducers are not trivial to impl generally I guess. >>>>> >>>>> However being close of this api can help a lot so +1 to try to have a >>>>> java dsl on top of current api. Would also be neat to integrate it with >>>>> completionstage :). >>>>> >>>>> >>>>> >>>>> Reuven >>>>> >>>>> >>>>> On Sun, Mar 11, 2018 at 2:29 PM Romain Manni-Bucau < >>>>> rmannibu...@gmail.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> Le 11 mars 2018 21:18, "Jean-Baptiste Onofré" <j...@nanthrax.net> a >>>>>> écrit : >>>>>> >>>>>> Hi Romain, >>>>>> >>>>>> I remember we have discussed about the way to express pipeline while >>>>>> ago. >>>>>> >>>>>> I was fan of a "DSL" compared to the one we have in Camel: instead of >>>>>> using apply(), use a dedicated form (like .map(), .reduce(), etc, AFAIR, >>>>>> it's the approach in flume). >>>>>> However, we agreed that apply() syntax gives a more flexible approach. >>>>>> >>>>>> Using Java Stream is interesting but I'm afraid we would have the >>>>>> same issue as the one we identified discussing "fluent Java SDK". >>>>>> However, >>>>>> we can have a Stream API DSL on top of the SDK IMHO. >>>>>> >>>>>> >>>>>> Agree and a beam stream interface (copying jdk api but making lambda >>>>>> serializable to avoid the cast need). >>>>>> >>>>>> On my side i think it enables user to discover the api. If you check >>>>>> my poc impl you quickly see the steps needed to do simple things like a >>>>>> map >>>>>> which is a first citizen. >>>>>> >>>>>> Also curious if we could impl reduce with pipeline result = get an >>>>>> output of a batch from the runner (client) jvm. I see how to do it for >>>>>> longs - with metrics - but not for collect(). >>>>>> >>>>>> >>>>>> Regards >>>>>> JB >>>>>> >>>>>> >>>>>> On 11/03/2018 19:46, Romain Manni-Bucau wrote: >>>>>> >>>>>>> Hi guys, >>>>>>> >>>>>>> don't know if you already experienced using java Stream API as a >>>>>>> replacement for pipeline API but did some tests: https://github.com/ >>>>>>> rmannibucau/jbeam >>>>>>> >>>>>>> It is far to be complete but already shows where it fails (beam >>>>>>> doesn't have a way to reduce in the caller machine for instance, coder >>>>>>> handling is not that trivial, lambda are not working well with default >>>>>>> Stream API etc...). >>>>>>> >>>>>>> However it is interesting to see that having such an API is pretty >>>>>>> natural compare to the pipeline API >>>>>>> so wonder if beam should work on its own Stream API (with surely >>>>>>> another name for obvious reasons ;)). >>>>>>> >>>>>>> Romain Manni-Bucau >>>>>>> @rmannibucau <https://twitter.com/rmannibucau> | Blog < >>>>>>> https://rmannibucau.metawerx.net/> | Old Blog < >>>>>>> http://rmannibucau.wordpress.com> | Github <https://github.com/ >>>>>>> rmannibucau> | LinkedIn <https://www.linkedin.com/in/rmannibucau> | >>>>>>> Book <https://www.packtpub.com/application-development/java- >>>>>>> ee-8-high-performance> >>>>>>> >>>>>> >>>>>> >>>>> >>>>>