jackye1995 commented on a change in pull request #2984: URL: https://github.com/apache/iceberg/pull/2984#discussion_r690063113
########## File path: flink/src/main/java/org/apache/iceberg/flink/source/FlinkInputFormat.java ########## @@ -72,12 +62,7 @@ public BaseStatistics getStatistics(BaseStatistics cachedStatistics) { @Override public FlinkInputSplit[] createInputSplits(int minNumSplits) throws IOException { - // Called in Job manager, so it is OK to load table from catalog. - tableLoader.open(); Review comment: not an expert in Flink, but agree this is a change of behavior that we should be cautious. Currently it seems that the input splits are created with the freshly loaded table from the serializable table loader. This means that a table can potentially be updated during the process and then generate a different array of splits comparing to using a read-only copy. I am not sure which behavior is correct though, it feels to me that using a read-only copy is more valid to ensure snapshot isolation. But if the Flink source is supposed to be reused to reflect always the latest view of the data, then using the table loader becomes necessary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org