Yes, for the dofn for instance, instead of having processcontext.element()=<T> you get a CompletionStage<T> and output gets it as well.
This way you register an execution chain. Mixed with streams you get a big data java 8/9/10 API which enabkes any connectivity in a wel performing manner ;). Le 10 mars 2018 13:56, "Reuven Lax" <re...@google.com> a écrit : > So you mean the user should have a way of registering asynchronous > activity with a callback (the callback must be registered with Beam, > because Beam needs to know not to mark the element as done until all > associated callbacks have completed). I think that's basically what Kenn > was suggesting, unless I'm missing something. > > > On Fri, Mar 9, 2018 at 11:07 PM Romain Manni-Bucau <rmannibu...@gmail.com> > wrote: > >> Yes, callback based. Beam today is synchronous and until bundles+combines >> are reactive friendly, beam will be synchronous whatever other parts do. >> Becoming reactive will enable to manage the threading issues properly and >> to have better scalability on the overall execution when remote IO are >> involved. >> >> However it requires to break source, sdf design to use completionstage - >> or equivalent - to chain the processing properly and in an unified fashion. >> >> Le 9 mars 2018 23:48, "Reuven Lax" <re...@google.com> a écrit : >> >> If you're talking about reactive programming, at a certain level beam is >> already reactive. Are you referring to a specific way of writing the code? >> >> >> On Fri, Mar 9, 2018 at 1:59 PM Reuven Lax <re...@google.com> wrote: >> >>> What do you mean by reactive? >>> >>> On Fri, Mar 9, 2018, 6:58 PM Romain Manni-Bucau <rmannibu...@gmail.com> >>> wrote: >>> >>>> @Kenn: why not preferring to make beam reactive? Would alow to scale >>>> way more without having to hardly synchronize multithreading. Elegant and >>>> efficient :). Beam 3? >>>> >>>> >>>> Le 9 mars 2018 22:49, "Kenneth Knowles" <k...@google.com> a écrit : >>>> >>>>> I will start with the "exciting futuristic" answer, which is that we >>>>> envision the new DoFn to be able to provide an automatic ExecutorService >>>>> parameters that you can use as you wish. >>>>> >>>>> new DoFn<>() { >>>>> @ProcessElement >>>>> public void process(ProcessContext ctx, ExecutorService >>>>> executorService) { >>>>> ... launch some futures, put them in instance vars ... >>>>> } >>>>> >>>>> @FinishBundle >>>>> public void finish(...) { >>>>> ... block on futures, output results if appropriate ... >>>>> } >>>>> } >>>>> >>>>> This way, the Java SDK harness can use its overarching knowledge of >>>>> what is going on in a computation to, for example, share a thread pool >>>>> between different bits. This was one reason to delete >>>>> IntraBundleParallelization - it didn't allow the runner and user code to >>>>> properly manage how many things were going on concurrently. And mostly the >>>>> runner should own parallelizing to max out cores and what user code needs >>>>> is asynchrony hooks that can interact with that. However, this feature is >>>>> not thoroughly considered. TBD how much the harness itself manages >>>>> blocking >>>>> on outstanding requests versus it being your responsibility in >>>>> FinishBundle, etc. >>>>> >>>>> I haven't explored rolling your own here, if you are willing to do the >>>>> knob tuning to get the threading acceptable for your particular use case. >>>>> Perhaps someone else can weigh in. >>>>> >>>>> Kenn >>>>> >>>>> On Fri, Mar 9, 2018 at 1:38 PM Josh Ferge < >>>>> josh.fe...@bounceexchange.com> wrote: >>>>> >>>>>> Hello all: >>>>>> >>>>>> Our team has a pipeline that make external network calls. These >>>>>> pipelines are currently super slow, and the hypothesis is that they are >>>>>> slow because we are not threading for our network calls. The github issue >>>>>> below provides some discussion around this: >>>>>> >>>>>> https://github.com/apache/beam/pull/957 >>>>>> >>>>>> In beam 1.0, there was IntraBundleParallelization, which helped with >>>>>> this. However, this was removed because it didn't comply with a few BEAM >>>>>> paradigms. >>>>>> >>>>>> Questions going forward: >>>>>> >>>>>> What is advised for jobs that make blocking network calls? It seems >>>>>> bundling the elements into groups of size X prior to passing to the DoFn, >>>>>> and managing the threading within the function might work. thoughts? >>>>>> Are these types of jobs even suitable for beam? >>>>>> Are there any plans to develop features that help with this? >>>>>> >>>>>> Thanks >>>>>> >>>>> >>