I am using v1. Does v1 support the initial splitting and distribution? since I expect it to distribute the initial splitting to multiple workers.
On Fri, Aug 21, 2020 at 11:00 AM Luke Cwik <[email protected]> wrote: > Are you using Dataflow runner v2[1] since the default for Beam Java still > uses Dataflow runner v1? > Dataflow runner v2 is the only one that supports autoscaling and dynamic > splitting of splittable dofns in bounded pipelines. > > 1: > https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#dataflow-runner-v2 > > On Fri, Aug 21, 2020 at 10:54 AM Jiadai Xia <[email protected]> wrote: > >> Hi, >> As stated in the title, I tried to implement a SDF for reading the >> Parquet file and I am trying to run it with Dataflow runner. As the initial >> split outputs a bunch of ranges but the number of workers are not scaled up >> and the work is not distributed. Any suggestion on what can be the problem? >> I have tested it with Direct runner and the parallelism looks fine on >> small samples on Direct Runner. >> Below is my implementation of the SDF >> https://github.com/apache/beam/pull/12223 >> -- >> >> >> >> >> >> *Jiadai Xia* >> >> SWE Intern >> >> 1 (646) 413 8071 <(646)%20413-8071> >> >> [email protected] >> >> <https://www.linkedin.com/company/google/> >> <https://www.youtube.com/user/lifeatgoogle> >> <https://www.facebook.com/lifeatgoogle/> >> <https://twitter.com/lifeatgoogle> >> >> <https://www.instagram.com/lifeatgoogle> >> >> >> -- *Jiadai Xia* SWE Intern 1 (646) 413 8071 [email protected] <https://www.linkedin.com/company/google/> <https://www.youtube.com/user/lifeatgoogle> <https://www.facebook.com/lifeatgoogle/> <https://twitter.com/lifeatgoogle> <https://www.instagram.com/lifeatgoogle>
