lukecwik commented on code in PR #22645:
URL: https://github.com/apache/beam/pull/22645#discussion_r941724720
##########
sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java:
##########
@@ -954,7 +949,17 @@ public void processElement(ProcessContext c, BoundedWindow
window) throws Except
writeOrClose(writer, getDynamicDestinations().formatRecord(input));
}
- // Close all writers.
+ // Clean-up any prior writers that were being closed as part of this
bundle before
Review Comment:
Note that the customer was limiting the close calls to 2 at a time max.
I didn't prefer adding the knob because I suspect the `close` call will most
of the time be done when the other IO operations are ongoing during the write
so we get most of the parallelism this way as it would be rare for the `close`
call to span more than the next element's write calls. Was there a benchmark
that you used to validate the improvement for the original change so I could
test this?
I can add the additional knob for users to control but I had suggested to
others that we should bound the amount of buffer memory we use globally within
the process to maximize parallelism and not have to write stuff like this. I
did this for the combiner table in
https://github.com/apache/beam/commit/5b81d140636e3fa774610aeb8a8896d02696b707
but we should extend everywhere.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]