What do you mean by reactive?

On Fri, Mar 9, 2018, 6:58 PM Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

> @Kenn: why not preferring to make beam reactive? Would alow to scale way
> more without having to hardly synchronize multithreading. Elegant and
> efficient :). Beam 3?
>
>
> Le 9 mars 2018 22:49, "Kenneth Knowles" <k...@google.com> a écrit :
>
>> I will start with the "exciting futuristic" answer, which is that we
>> envision the new DoFn to be able to provide an automatic ExecutorService
>> parameters that you can use as you wish.
>>
>>     new DoFn<>() {
>>       @ProcessElement
>>       public void process(ProcessContext ctx, ExecutorService
>> executorService) {
>>           ... launch some futures, put them in instance vars ...
>>       }
>>
>>       @FinishBundle
>>       public void finish(...) {
>>          ... block on futures, output results if appropriate ...
>>       }
>>     }
>>
>> This way, the Java SDK harness can use its overarching knowledge of what
>> is going on in a computation to, for example, share a thread pool between
>> different bits. This was one reason to delete IntraBundleParallelization -
>> it didn't allow the runner and user code to properly manage how many things
>> were going on concurrently. And mostly the runner should own parallelizing
>> to max out cores and what user code needs is asynchrony hooks that can
>> interact with that. However, this feature is not thoroughly considered. TBD
>> how much the harness itself manages blocking on outstanding requests versus
>> it being your responsibility in FinishBundle, etc.
>>
>> I haven't explored rolling your own here, if you are willing to do the
>> knob tuning to get the threading acceptable for your particular use case.
>> Perhaps someone else can weigh in.
>>
>> Kenn
>>
>> On Fri, Mar 9, 2018 at 1:38 PM Josh Ferge <josh.fe...@bounceexchange.com>
>> wrote:
>>
>>> Hello all:
>>>
>>> Our team has a pipeline that make external network calls. These
>>> pipelines are currently super slow, and the hypothesis is that they are
>>> slow because we are not threading for our network calls. The github issue
>>> below provides some discussion around this:
>>>
>>> https://github.com/apache/beam/pull/957
>>>
>>> In beam 1.0, there was IntraBundleParallelization, which helped with
>>> this. However, this was removed because it didn't comply with a few BEAM
>>> paradigms.
>>>
>>> Questions going forward:
>>>
>>> What is advised for jobs that make blocking network calls? It seems
>>> bundling the elements into groups of size X prior to passing to the DoFn,
>>> and managing the threading within the function might work. thoughts?
>>> Are these types of jobs even suitable for beam?
>>> Are there any plans to develop features that help with this?
>>>
>>> Thanks
>>>
>>

Reply via email to