What do you mean by reactive? On Fri, Mar 9, 2018, 6:58 PM Romain Manni-Bucau <rmannibu...@gmail.com> wrote:
> @Kenn: why not preferring to make beam reactive? Would alow to scale way > more without having to hardly synchronize multithreading. Elegant and > efficient :). Beam 3? > > > Le 9 mars 2018 22:49, "Kenneth Knowles" <k...@google.com> a écrit : > >> I will start with the "exciting futuristic" answer, which is that we >> envision the new DoFn to be able to provide an automatic ExecutorService >> parameters that you can use as you wish. >> >> new DoFn<>() { >> @ProcessElement >> public void process(ProcessContext ctx, ExecutorService >> executorService) { >> ... launch some futures, put them in instance vars ... >> } >> >> @FinishBundle >> public void finish(...) { >> ... block on futures, output results if appropriate ... >> } >> } >> >> This way, the Java SDK harness can use its overarching knowledge of what >> is going on in a computation to, for example, share a thread pool between >> different bits. This was one reason to delete IntraBundleParallelization - >> it didn't allow the runner and user code to properly manage how many things >> were going on concurrently. And mostly the runner should own parallelizing >> to max out cores and what user code needs is asynchrony hooks that can >> interact with that. However, this feature is not thoroughly considered. TBD >> how much the harness itself manages blocking on outstanding requests versus >> it being your responsibility in FinishBundle, etc. >> >> I haven't explored rolling your own here, if you are willing to do the >> knob tuning to get the threading acceptable for your particular use case. >> Perhaps someone else can weigh in. >> >> Kenn >> >> On Fri, Mar 9, 2018 at 1:38 PM Josh Ferge <josh.fe...@bounceexchange.com> >> wrote: >> >>> Hello all: >>> >>> Our team has a pipeline that make external network calls. These >>> pipelines are currently super slow, and the hypothesis is that they are >>> slow because we are not threading for our network calls. The github issue >>> below provides some discussion around this: >>> >>> https://github.com/apache/beam/pull/957 >>> >>> In beam 1.0, there was IntraBundleParallelization, which helped with >>> this. However, this was removed because it didn't comply with a few BEAM >>> paradigms. >>> >>> Questions going forward: >>> >>> What is advised for jobs that make blocking network calls? It seems >>> bundling the elements into groups of size X prior to passing to the DoFn, >>> and managing the threading within the function might work. thoughts? >>> Are these types of jobs even suitable for beam? >>> Are there any plans to develop features that help with this? >>> >>> Thanks >>> >>