Re: [I] [Bug]: IcebergIO - Write performance issues [beam]

via GitHub Fri, 11 Oct 2024 08:26:49 -0700


DanielMorales9 commented on issue #32746:
URL: https://github.com/apache/beam/issues/32746#issuecomment-2407648426


   Yes, I suffer the same parallelism problem:
   <img width="784" alt="Screenshot 2024-10-11 at 16 54 27" 
src="https://github.com/user-attachments/assets/33b389fd-1228-4e42-9e80-d2c68ad1af20";>
   
   > - Adding .apply(Redistribute.<Row>arbitrarily().withNumBuckets(<N>)) 
before the write step, reducing the parallelism to N
   Is it similar to the Spark `repartition`? Does it shuffle data? How will it 
work with autoscaling enabled?
   
   > Adding .apply(Redistribute.<Row>arbitrarily().withNumBuckets(<N>)) before 
the write step, reducing the parallelism to N
   > Use the --numberOfWorkerHarnessThreads=N pipeline option, which sets an 
upper bound on the number of threads per worker
   
   Right now, I have autoscaling disabled an I will try to set `N=2` and 
`machineType=n1-standard-4`. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Bug]: IcebergIO - Write performance issues [beam]

Reply via email to