robertwb commented on code in PR #28045:
URL: https://github.com/apache/beam/pull/28045#discussion_r1304948776
##########
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/SourceInputFormat.java:
##########
@@ -108,12 +110,27 @@ public float getAverageRecordWidth() {
return null;
}
+ private long getDesiredSizeBytes(int numSplits) throws Exception {
+ long totalSize = initialSource.getEstimatedSizeBytes(options);
+ long defaultSplitSize = totalSize / numSplits;
+ if (initialSource instanceof FileBasedSource) {
+ long maxSplitSize =
options.as(FlinkPipelineOptions.class).getFileInputSplitMaxSizeBytes();
Review Comment:
Could we let a value of 0 or -1 disable this new behavior and restore to the
original?
##########
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java:
##########
@@ -313,6 +313,12 @@ public interface FlinkPipelineOptions
void setFlinkConfDir(String confDir);
+ @Description("Set the maximum size of input split when data is read from a
filesystem.")
+ @Default.Long(128 * 1024 * 1024)
Review Comment:
We could let the default be the old behavior, and only set this for
pipelines that are affected. If you have good evidence that this is generally
helpful and rarely hurtful providing a good default makes sense.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]