2018-03-10 16:19 GMT+01:00 Reuven Lax <re...@google.com>: > This is another version (maybe a better, Java 8 idiomatic one?) of what > Kenn suggested. > > Note that with NewDoFn this need not be incompatible (so might not require > waiting till Beam 3.0). We can recognize new parameters to processElement > and populate add needed. >
This is right however in my head it was a single way movemenent to enforce the design to be reactive and not fake a reactive API with a sync and not reactive impl which is what would be done today with both support I fear. > > On Sat, Mar 10, 2018, 12:13 PM Romain Manni-Bucau <rmannibu...@gmail.com> > wrote: > >> Yes, for the dofn for instance, instead of having >> processcontext.element()=<T> you get a CompletionStage<T> and output gets >> it as well. >> >> This way you register an execution chain. Mixed with streams you get a >> big data java 8/9/10 API which enabkes any connectivity in a wel performing >> manner ;). >> >> Le 10 mars 2018 13:56, "Reuven Lax" <re...@google.com> a écrit : >> >>> So you mean the user should have a way of registering asynchronous >>> activity with a callback (the callback must be registered with Beam, >>> because Beam needs to know not to mark the element as done until all >>> associated callbacks have completed). I think that's basically what Kenn >>> was suggesting, unless I'm missing something. >>> >>> >>> On Fri, Mar 9, 2018 at 11:07 PM Romain Manni-Bucau < >>> rmannibu...@gmail.com> wrote: >>> >>>> Yes, callback based. Beam today is synchronous and until >>>> bundles+combines are reactive friendly, beam will be synchronous whatever >>>> other parts do. Becoming reactive will enable to manage the threading >>>> issues properly and to have better scalability on the overall execution >>>> when remote IO are involved. >>>> >>>> However it requires to break source, sdf design to use completionstage >>>> - or equivalent - to chain the processing properly and in an unified >>>> fashion. >>>> >>>> Le 9 mars 2018 23:48, "Reuven Lax" <re...@google.com> a écrit : >>>> >>>> If you're talking about reactive programming, at a certain level beam >>>> is already reactive. Are you referring to a specific way of writing the >>>> code? >>>> >>>> >>>> On Fri, Mar 9, 2018 at 1:59 PM Reuven Lax <re...@google.com> wrote: >>>> >>>>> What do you mean by reactive? >>>>> >>>>> On Fri, Mar 9, 2018, 6:58 PM Romain Manni-Bucau <rmannibu...@gmail.com> >>>>> wrote: >>>>> >>>>>> @Kenn: why not preferring to make beam reactive? Would alow to scale >>>>>> way more without having to hardly synchronize multithreading. Elegant and >>>>>> efficient :). Beam 3? >>>>>> >>>>>> >>>>>> Le 9 mars 2018 22:49, "Kenneth Knowles" <k...@google.com> a écrit : >>>>>> >>>>>>> I will start with the "exciting futuristic" answer, which is that we >>>>>>> envision the new DoFn to be able to provide an automatic ExecutorService >>>>>>> parameters that you can use as you wish. >>>>>>> >>>>>>> new DoFn<>() { >>>>>>> @ProcessElement >>>>>>> public void process(ProcessContext ctx, ExecutorService >>>>>>> executorService) { >>>>>>> ... launch some futures, put them in instance vars ... >>>>>>> } >>>>>>> >>>>>>> @FinishBundle >>>>>>> public void finish(...) { >>>>>>> ... block on futures, output results if appropriate ... >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> This way, the Java SDK harness can use its overarching knowledge of >>>>>>> what is going on in a computation to, for example, share a thread pool >>>>>>> between different bits. This was one reason to delete >>>>>>> IntraBundleParallelization - it didn't allow the runner and user code to >>>>>>> properly manage how many things were going on concurrently. And mostly >>>>>>> the >>>>>>> runner should own parallelizing to max out cores and what user code >>>>>>> needs >>>>>>> is asynchrony hooks that can interact with that. However, this feature >>>>>>> is >>>>>>> not thoroughly considered. TBD how much the harness itself manages >>>>>>> blocking >>>>>>> on outstanding requests versus it being your responsibility in >>>>>>> FinishBundle, etc. >>>>>>> >>>>>>> I haven't explored rolling your own here, if you are willing to do >>>>>>> the knob tuning to get the threading acceptable for your particular use >>>>>>> case. Perhaps someone else can weigh in. >>>>>>> >>>>>>> Kenn >>>>>>> >>>>>>> On Fri, Mar 9, 2018 at 1:38 PM Josh Ferge < >>>>>>> josh.fe...@bounceexchange.com> wrote: >>>>>>> >>>>>>>> Hello all: >>>>>>>> >>>>>>>> Our team has a pipeline that make external network calls. These >>>>>>>> pipelines are currently super slow, and the hypothesis is that they are >>>>>>>> slow because we are not threading for our network calls. The github >>>>>>>> issue >>>>>>>> below provides some discussion around this: >>>>>>>> >>>>>>>> https://github.com/apache/beam/pull/957 >>>>>>>> >>>>>>>> In beam 1.0, there was IntraBundleParallelization, which helped >>>>>>>> with this. However, this was removed because it didn't comply with a >>>>>>>> few >>>>>>>> BEAM paradigms. >>>>>>>> >>>>>>>> Questions going forward: >>>>>>>> >>>>>>>> What is advised for jobs that make blocking network calls? It seems >>>>>>>> bundling the elements into groups of size X prior to passing to the >>>>>>>> DoFn, >>>>>>>> and managing the threading within the function might work. thoughts? >>>>>>>> Are these types of jobs even suitable for beam? >>>>>>>> Are there any plans to develop features that help with this? >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>> >>>>