Re: Splittable-Dofn not distributing the work to multiple workers

Luke Cwik Fri, 21 Aug 2020 11:45:48 -0700

Yes it does.

There should be a reshuffle between the initial splitting and the
processing portion.


On Fri, Aug 21, 2020 at 11:04 AM Jiadai Xia <daniel...@google.com> wrote:

> I am using v1. Does v1 support the initial splitting and distribution?
> since I expect it to distribute the initial splitting to multiple workers.
>
> On Fri, Aug 21, 2020 at 11:00 AM Luke Cwik <lc...@google.com> wrote:
>
>> Are you using Dataflow runner v2[1] since the default for Beam Java still
>> uses Dataflow runner v1?
>> Dataflow runner v2 is the only one that supports autoscaling and dynamic
>> splitting of splittable dofns in bounded pipelines.
>>
>> 1:
>> https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#dataflow-runner-v2
>>
>> On Fri, Aug 21, 2020 at 10:54 AM Jiadai Xia <daniel...@google.com> wrote:
>>
>>> Hi,
>>> As stated in the title, I tried to implement a SDF for reading the
>>> Parquet file and I am trying to run it with Dataflow runner. As the initial
>>> split outputs a bunch of ranges but the number of workers are not scaled up
>>> and the work is not distributed. Any suggestion on what can be the problem?
>>> I have tested it with Direct runner and the parallelism looks fine on
>>> small samples on Direct Runner.
>>> Below is my implementation of the SDF
>>> https://github.com/apache/beam/pull/12223
>>> --
>>>
>>>
>>>
>>>
>>>
>>> *Jiadai Xia*
>>>
>>> SWE Intern
>>>
>>> 1 (646) 413 8071 <(646)%20413-8071>
>>>
>>> daniel...@google.com
>>>
>>> <https://www.linkedin.com/company/google/>
>>> <https://www.youtube.com/user/lifeatgoogle>
>>> <https://www.facebook.com/lifeatgoogle/>
>>> <https://twitter.com/lifeatgoogle>
>>>
>>> <https://www.instagram.com/lifeatgoogle>
>>>
>>>
>>>
>
> --
>
>
>
>
>
> *Jiadai Xia*
>
> SWE Intern
>
> 1 (646) 413 8071 <(646)%20413-8071>
>
> daniel...@google.com
>
> <https://www.linkedin.com/company/google/>
> <https://www.youtube.com/user/lifeatgoogle>
> <https://www.facebook.com/lifeatgoogle/>
> <https://twitter.com/lifeatgoogle>
>
> <https://www.instagram.com/lifeatgoogle>
>
>
>

Re: Splittable-Dofn not distributing the work to multiple workers

Reply via email to