lukecwik commented on code in PR #22645:
URL: https://github.com/apache/beam/pull/22645#discussion_r941724720


##########
sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java:
##########
@@ -954,7 +949,17 @@ public void processElement(ProcessContext c, BoundedWindow 
window) throws Except
         writeOrClose(writer, getDynamicDestinations().formatRecord(input));
       }
 
-      // Close all writers.
+      // Clean-up any prior writers that were being closed as part of this 
bundle before

Review Comment:
   Note that the customer was limiting the close calls to 2 at a time max.
   
   I didn't prefer adding the knob because I suspect the `close` call will most 
of the time be done when the other IO operations are ongoing during the write 
so we get most of the parallelism this way as it would be rare for the `close` 
call to span more than the next element's write calls. Was there a benchmark 
that you used to validate the improvement for the original change so I could 
test this?
   
   I can add the additional knob for users to control but I had suggested to 
others that we should bound the amount of buffer memory we use globally within 
the process to maximize parallelism and not have to write stuff like this. I 
did this for the combiner table in 
https://github.com/apache/beam/commit/5b81d140636e3fa774610aeb8a8896d02696b707 
but we should extend everywhere.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to