Yes. The very original Python API didn't have GBK, just a lambda-parameterized groupBy.
On Sat, Mar 17, 2018, 12:21 AM Romain Manni-Bucau <rmannibu...@gmail.com> wrote: > Gbk can be fluent if you pass a key extractor lambda ;) > > Le 17 mars 2018 00:00, "Jean-Baptiste Onofré" <j...@nanthrax.net> a écrit : > >> Big +1 >> >> Regards >> JB >> Le 16 mars 2018, à 15:59, Reuven Lax <re...@google.com> a écrit: >>> >>> BTW while it's true that raw GBK can't be fluent (due to constraint on >>> element type). once we have schema support we can introduce groupByField, >>> and that can be fluent. >>> >>> >>> On Wed, Mar 14, 2018 at 11:50 PM Robert Bradshaw <rober...@google.com> >>> wrote: >>> >>>> On Wed, Mar 14, 2018 at 11:04 PM Romain Manni-Bucau < >>>> rmannibu...@gmail.com> >>>> wrote: >>>> >>>> > Le 15 mars 2018 06:52, "Robert Bradshaw" <rober...@google.com> a >>>> écrit : >>>> >>>> >> The stream API was looked at way back when we were designing the API; >>>> one of the primary reasons it was not further pursued at the time was >>>> the >>>> demand for Java 7 compatibility. It is also much more natural with >>>> lambdas, >>>> but unfortunately the Java compiler discards types in this case, making >>>> coder inference impossible. Still is interesting to explore, and I've >>>> been >>>> toying with using this wrapping method for other applications >>>> (specifically, giving a Pandas Dataframe API to PCollections in Python). >>>> >>>> >> There's a higher level question lingering here about making things >>>> more >>>> fluent by putting methods on PCollections in our primary API. It was >>>> somewhat of an experiment to go the very pure approach of *everything* >>>> being expressed a PTransform, and this is not without its disadvantages, >>>> and (gasp) may be worth revisiting. In particular, some things that have >>>> changed in the meantime are >>>> >>>> >> * The Java SDK is no longer *the* definition of the model. The model >>>> has >>>> been (mostly) formalized in the portability work, and the general Beam >>>> concepts and notion of PTransform are much more widely fleshed out and >>>> understood. >>>> >>>> > This is wrong for all java users which are still the mainstream. It is >>>> important to keep that in mind and even if I know portable API is >>>> something >>>> important for you, >>>> >>>> I think you miss-understood me. My point is that it is now much easier >>>> to >>>> disentangle the essence of the Beam model (reified in part in the >>>> portable >>>> API) from the Java API itself (which may evolve more independently, >>>> whereas >>>> formerly syntactic sugar here would be conflated with core concepts). >>>> >>>> > it is solething which should stay on top of runners and their api >>>> which >>>> means java for all but one. >>>> >>>> > All that to say that the most common default is java. >>>> >>>> I don't think it'll be that way for long; scala alone might give Java a >>>> run >>>> for its money. >>>> >>>> > However I agree each language should have its natural API and should >>>> absolutely not just port over the same API. Goal being indeed to respect >>>> its own philosophy. >>>> >>>> > Conclusion: java needs a most expressive stream like API. >>>> >>>> > There is another way to see it: catching up API debt compared to >>>> concurrent API. >>>> >>>> >>>> >> * Java 8's lambdas, etc. allows for much more succinct >>>> representation of >>>> operations, which makes the relative ratio of boilerplate of using apply >>>> that much higher. This is one of the struggles we had with the Python >>>> API, >>>> pcoll.apply(Map(lambda ...)) made the "apply" feel *very* redundant. >>>> pcoll >>>> | Map(...) is at least closer to pcoll.map(...). >>>> >> * With over two years of experience with the 100% pure approach, we >>>> still haven't "gotten used to it" enough that adding such methods isn't >>>> appealing. (Note that by design adding such methods later is always >>>> easier >>>> than taking them away, which was one justification for starting at the >>>> extreme point). >>>> >>>> >> Even if we go this route, there's no need to remove apply, and >>>> >>>> >> pcoll >>>> >> .map(...) >>>> >> .apply(...) >>>> >> .flatMap(...) >>>> >>>> >> flows fairly well (with map/flatMap being syntactic sugar to apply). >>>> >>>> >> Agree but the issue with that is you loose the natural approach and >>>> it >>>> is harder to rework it whereas having an api on top of "apply" let you >>>> keep >>>> both concerns split. >>>> >>>> Having multiple APIs undesirable, best to have one unless there are hard >>>> constraints that prevent it (e.g. if the two would be jarringly >>>> inconsistent, or one is forced by an interface, etc.) >>>> >>>> >> Also pcollection api is what is complex (coders, sides, ...) and >>>> what I >>>> hope we can hide behind another API. >>>> >>>> I'd like to simplify things as well. >>>> >>>> >> I think we would also have to still use apply for parameterless >>>> operations like gbk that place constraints on the element types. I don't >>>> see how to do combinePerKey either (though, asymmetrically, >>>> globalCombine >>>> is fine). >>>> >>>> >> The largest fear I have is feature creep. There would have to be a >>>> very >>>> clear line of what's in and what's not, likely with what's in being a >>>> very >>>> short list (which is probably OK and would give the biggest gain, but >>>> not >>>> much discoverability). The criteria can't be primitives (gbk is >>>> problematic, and the most natural map isn't really the full ParDo >>>> primitive--in fact the full ParDo might be "advanced" enough to merit >>>> requiring apply). >>>> >>>> > Is the previous proposal an issue (jet api)? >>>> >>>> On first glance, StreamStage doesn't sound to me like a PCollection >>>> (mixes >>>> the notion of operations and values), and methods like >>>> flatMapUsingContext >>>> and hashJoin2 seem far down the slippery slope. But I haven't spent that >>>> much time looking at it. >>>> >>>> >> Who knows, though I still think we made the right decision to attempt >>>> apply-only at the time, maybe I'll have to flesh this out into a new >>>> blog >>>> post that is a rebuttal to my original one :). >>>> >>>> > Maybe for part of the users, clearly not for the ones I met last 3 >>>> months >>>> (what they said opening their IDE is censured ;)). >>>> >>>