Misses the collect split in 3 (supplier, combiner, aggregator) but globally agree.
I d just take java stream, remove "client" method or make them big data if possible, ensure all hooks are serializable to avoid hacks and add an unwrap to be able to access the pipeline in case we need a very custom thing and we are done for me. Le 13 mars 2018 19:26, "Ben Chambers" <bchamb...@apache.org> a écrit : > I think the existing rationale (not introducing lots of special fluent > methods) makes sense. However, if we look at the Java Stream API, we > probably wouldn't need to introduce *a lot* of fluent builders to get most > of the functionality. Specifically, if we focus on map, flatMap, and > collect from the Stream API, and a few extensions, we get something like: > > * collection.map(DoFn) for applying aParDo > * collection.map(SerialiazableFn) for Java8 lambda shorthand > * collection.flatMap(SerialiazbleFn) for Java8 lambda shorthand > * collection.collect(CombineFn) for applying a CombineFn > * collection.apply(PTransform) for applying a composite transform. note > that PTransforms could also use serializable lambdas for definition. > > (Note that GroupByKey doesn't even show up here -- it could, but that > could also be way of wrapping a collector, as in the Java8 > Collectors.groupyingBy [1] > > With this, we could write code like: > > collection > .map(myDoFn) > .map((s) -> s.toString()) > .collect(new IntegerCombineFn()) > .apply(GroupByKey.of()); > > That said, my two concerns are: > (1) having two similar but different Java APIs. If we have more idiomatic > way of writing pipelines in Java, we should make that the standard. > Otherwise, users will be confused by seeing "Beam" examples written in > multiple, incompatible syntaxes. > (2) making sure the above is truly idiomatic Java and that it doesn't any > conflicts with the cross-language Beam programming model. I don't think it > does. We have (I believ) chosen to make the Python and Go SDKs idiomatic > for those languages where possible. > > If this work is focused on making the Java SDK more idiomatic (and thus > easier for Java users to learn), it seems like a good thing. We should just > make sure it doesn't scope-creep into defining an entirely new DSL or SDK. > > [1] https://docs.oracle.com/javase/8/docs/api/java/util/ > stream/Collectors.html#groupingBy-java.util.function.Function- > > On Tue, Mar 13, 2018 at 11:06 AM Romain Manni-Bucau <rmannibu...@gmail.com> > wrote: > >> Yep >> >> I know the rational and it makes sense but it also increases the entering >> steps for users and is not that smooth in ides, in particular for custom >> code. >> >> So I really think it makes sense to build an user friendly api on top of >> beam core dev one. >> >> >> Le 13 mars 2018 18:35, "Aljoscha Krettek" <aljos...@apache.org> a écrit : >> >>> https://beam.apache.org/blog/2016/05/27/where-is-my- >>> pcollection-dot-map.html >>> >>> On 11. Mar 2018, at 22:21, Romain Manni-Bucau <rmannibu...@gmail.com> >>> wrote: >>> >>> >>> >>> Le 12 mars 2018 00:16, "Reuven Lax" <re...@google.com> a écrit : >>> >>> I think it would be interesting to see what a Java stream-based API >>> would look like. As I mentioned elsewhere, we are not limited to having >>> only one API for Beam. >>> >>> If I remember correctly, a Java stream API was considered for Dataflow >>> back at the very beginning. I don't completely remember why it was >>> rejected, but I suspect at least part of the reason might have been that >>> Java streams were considered too new and untested back then. >>> >>> >>> Coders are broken - typevariables dont have bounds except object - and >>> reducers are not trivial to impl generally I guess. >>> >>> However being close of this api can help a lot so +1 to try to have a >>> java dsl on top of current api. Would also be neat to integrate it with >>> completionstage :). >>> >>> >>> >>> Reuven >>> >>> >>> On Sun, Mar 11, 2018 at 2:29 PM Romain Manni-Bucau < >>> rmannibu...@gmail.com> wrote: >>> >>>> >>>> >>>> Le 11 mars 2018 21:18, "Jean-Baptiste Onofré" <j...@nanthrax.net> a >>>> écrit : >>>> >>>> Hi Romain, >>>> >>>> I remember we have discussed about the way to express pipeline while >>>> ago. >>>> >>>> I was fan of a "DSL" compared to the one we have in Camel: instead of >>>> using apply(), use a dedicated form (like .map(), .reduce(), etc, AFAIR, >>>> it's the approach in flume). >>>> However, we agreed that apply() syntax gives a more flexible approach. >>>> >>>> Using Java Stream is interesting but I'm afraid we would have the same >>>> issue as the one we identified discussing "fluent Java SDK". However, we >>>> can have a Stream API DSL on top of the SDK IMHO. >>>> >>>> >>>> Agree and a beam stream interface (copying jdk api but making lambda >>>> serializable to avoid the cast need). >>>> >>>> On my side i think it enables user to discover the api. If you check my >>>> poc impl you quickly see the steps needed to do simple things like a map >>>> which is a first citizen. >>>> >>>> Also curious if we could impl reduce with pipeline result = get an >>>> output of a batch from the runner (client) jvm. I see how to do it for >>>> longs - with metrics - but not for collect(). >>>> >>>> >>>> Regards >>>> JB >>>> >>>> >>>> On 11/03/2018 19:46, Romain Manni-Bucau wrote: >>>> >>>>> Hi guys, >>>>> >>>>> don't know if you already experienced using java Stream API as a >>>>> replacement for pipeline API but did some tests: https://github.com/ >>>>> rmannibucau/jbeam >>>>> >>>>> It is far to be complete but already shows where it fails (beam >>>>> doesn't have a way to reduce in the caller machine for instance, coder >>>>> handling is not that trivial, lambda are not working well with default >>>>> Stream API etc...). >>>>> >>>>> However it is interesting to see that having such an API is pretty >>>>> natural compare to the pipeline API >>>>> so wonder if beam should work on its own Stream API (with surely >>>>> another name for obvious reasons ;)). >>>>> >>>>> Romain Manni-Bucau >>>>> @rmannibucau <https://twitter.com/rmannibucau> | Blog < >>>>> https://rmannibucau.metawerx.net/> | Old Blog < >>>>> http://rmannibucau.wordpress.com> | Github <https://github.com/ >>>>> rmannibucau> | LinkedIn <https://www.linkedin.com/in/rmannibucau> | >>>>> Book <https://www.packtpub.com/application-development/java- >>>>> ee-8-high-performance> >>>>> >>>> >>>> >>> >>>