jackye1995 commented on a change in pull request #2984:
URL: https://github.com/apache/iceberg/pull/2984#discussion_r690063113



##########
File path: 
flink/src/main/java/org/apache/iceberg/flink/source/FlinkInputFormat.java
##########
@@ -72,12 +62,7 @@ public BaseStatistics getStatistics(BaseStatistics 
cachedStatistics) {
 
   @Override
   public FlinkInputSplit[] createInputSplits(int minNumSplits) throws 
IOException {
-    // Called in Job manager, so it is OK to load table from catalog.
-    tableLoader.open();

Review comment:
       not an expert in Flink, but agree this is a change of behavior that we 
should be cautious. Currently it seems that the input splits are created with 
the freshly loaded table from the serializable table loader. This means that a 
table can potentially be updated during the process and then generate a 
different array of splits comparing to using a read-only copy. 
   
   I am not sure which behavior is correct though, it feels to me that using a 
read-only copy is more valid to ensure snapshot isolation. But if the Flink 
source is supposed to be reused to reflect always the latest view of the data, 
then using the table loader becomes necessary.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to