DanielMorales9 commented on issue #32746: URL: https://github.com/apache/beam/issues/32746#issuecomment-2407648426
Yes, I suffer the same parallelism problem: <img width="784" alt="Screenshot 2024-10-11 at 16 54 27" src="https://github.com/user-attachments/assets/33b389fd-1228-4e42-9e80-d2c68ad1af20"> > - Adding .apply(Redistribute.<Row>arbitrarily().withNumBuckets(<N>)) before the write step, reducing the parallelism to N Is it similar to the Spark `repartition`? Does it shuffle data? How will it work with autoscaling enabled? > Adding .apply(Redistribute.<Row>arbitrarily().withNumBuckets(<N>)) before the write step, reducing the parallelism to N > Use the --numberOfWorkerHarnessThreads=N pipeline option, which sets an upper bound on the number of threads per worker Right now, I have autoscaling disabled an I will try to set `N=2` and `machineType=n1-standard-4`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
