EricJoy2048 commented on code in PR #2865:
URL: 
https://github.com/apache/incubator-seatunnel/pull/2865#discussion_r982216383


##########
seatunnel-connectors-v2/connector-iceberg/src/main/java/org/apache/seatunnel/connectors/seatunnel/iceberg/source/enumerator/IcebergStreamSplitEnumerator.java:
##########
@@ -51,35 +50,41 @@ public IcebergStreamSplitEnumerator(@NonNull 
SourceSplitEnumerator.Context<Icebe
         }
     }
 
+    @Override
+    public synchronized void run() {
+        loadNewSplitsToPendingSplits(icebergTableLoader.loadTable());
+        assignPendingSplits(context.registeredReaders());
+    }
+
     @Override
     public IcebergSplitEnumeratorState snapshotState(long checkpointId) throws 
Exception {
         return new IcebergSplitEnumeratorState(enumeratorPosition.get(), 
pendingSplits);
     }
 
     @Override
-    public void handleSplitRequest(int subtaskId) {
-        synchronized (this) {
-            if (pendingSplits.isEmpty() ||
-                pendingSplits.get(subtaskId) == null) {
-                refreshPendingSplits();
-            }
-            assignPendingSplits(Collections.singleton(subtaskId));
+    public synchronized void handleSplitRequest(int subtaskId) {
+        if (pendingSplits.isEmpty() ||

Review Comment:
   I think we need get the checkpointLock here. I found two problem in this 
code.
   
   1、From this code we can know if the split send to reader complete, it will 
remove from `pendingSplits`. And we store the `pendingSplits` when the 
`snapshotState ` method called. We must ensure `pendingSplits` update and store 
`pendingSplits` to hdfs synchronization.
   
   ```
   protected void assignPendingSplits(Set<Integer> pendingReaders) {
           log.debug("Assign pendingSplits to readers {}", pendingReaders);
   
           for (int pendingReader : pendingReaders) {
               List<IcebergFileScanTaskSplit> pendingAssignmentForReader = 
pendingSplits.remove(pendingReader);
               if (pendingAssignmentForReader != null && 
!pendingAssignmentForReader.isEmpty()) {
                   log.info("Assign splits {} to reader {}",
                       pendingAssignmentForReader, pendingReader);
                   try {
                       context.assignSplit(pendingReader, 
pendingAssignmentForReader);
                   } catch (Exception e) {
                       log.error("Failed to assign splits {} to reader {}",
                           pendingAssignmentForReader, pendingReader, e);
                       pendingSplits.put(pendingReader, 
pendingAssignmentForReader);
                   }
               }
           }
       }
   ```
   
   2. Enumerator send split split#1 to Reader and then snapshot to hdfs(Suppose 
a checkpoint occurs after the send split is completed). If Enumerator Task 
failed and restored, the split#1 can not be found in snapshotState. The Reader 
received the split#1 and update `pendingSplits` in reader.  Will 
`Reader#snapshotState` execute before update `pendingSplits`? f this happens, 
when the next restore occurs, split # 1 cannot be found after the enumerator is 
restored, and there is no split # 1 after the Reader is restored. As a result, 
split # 1 is lost.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to