Re: Splittable-Dofn not distributing the work to multiple workers

Jiadai Xia Fri, 21 Aug 2020 11:04:02 -0700

I am using v1. Does v1 support the initial splitting and distribution?
since I expect it to distribute the initial splitting to multiple workers.


On Fri, Aug 21, 2020 at 11:00 AM Luke Cwik <[email protected]> wrote:

> Are you using Dataflow runner v2[1] since the default for Beam Java still
> uses Dataflow runner v1?
> Dataflow runner v2 is the only one that supports autoscaling and dynamic
> splitting of splittable dofns in bounded pipelines.
>
> 1:
> https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#dataflow-runner-v2
>
> On Fri, Aug 21, 2020 at 10:54 AM Jiadai Xia <[email protected]> wrote:
>
>> Hi,
>> As stated in the title, I tried to implement a SDF for reading the
>> Parquet file and I am trying to run it with Dataflow runner. As the initial
>> split outputs a bunch of ranges but the number of workers are not scaled up
>> and the work is not distributed. Any suggestion on what can be the problem?
>> I have tested it with Direct runner and the parallelism looks fine on
>> small samples on Direct Runner.
>> Below is my implementation of the SDF
>> https://github.com/apache/beam/pull/12223
>> --
>>
>>
>>
>>
>>
>> *Jiadai Xia*
>>
>> SWE Intern
>>
>> 1 (646) 413 8071 <(646)%20413-8071>
>>
>> [email protected]
>>
>> <https://www.linkedin.com/company/google/>
>> <https://www.youtube.com/user/lifeatgoogle>
>> <https://www.facebook.com/lifeatgoogle/>
>> <https://twitter.com/lifeatgoogle>
>>
>> <https://www.instagram.com/lifeatgoogle>
>>
>>
>>

-- 





*Jiadai Xia*

SWE Intern

1 (646) 413 8071

[email protected]

<https://www.linkedin.com/company/google/>
<https://www.youtube.com/user/lifeatgoogle>
<https://www.facebook.com/lifeatgoogle/> <https://twitter.com/lifeatgoogle>

<https://www.instagram.com/lifeatgoogle>

Re: Splittable-Dofn not distributing the work to multiple workers

Reply via email to