JeremyXin commented on code in PR #8453:
URL: https://github.com/apache/seatunnel/pull/8453#discussion_r1903592674


##########
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/source/split/FileSourceSplitEnumerator.java:
##########
@@ -107,8 +110,7 @@ private void assignSplit(int taskId) {
         context.assignSplit(taskId, currentTaskSplits);
         // save the state of assigned splits
         assignedSplit.addAll(currentTaskSplits);
-        // remove the assigned splits from pending splits
-        currentTaskSplits.forEach(split -> pendingSplit.remove(split));

Review Comment:
   Because the principle of the poll-based file allocation strategy I consider 
is based on the location of the split in the collection of pendingSplit files, 
using assignCount and parallelism modulus to determine which task should be 
assigned to the split. The premise of this method is that the pendingSplit set 
remains unchanged during allocation. If you remove allocated splits from the 
pendingSplit collection, it can cause a change in the location of other splits, 
potentially changing their owners and causing problems.
   
   I know that the original purpose of adding this line of code is to prevent 
the double allocation of split, but in my opinion, the submitted code 
calculates the module based on assignCount and parallelism, and only when the 
result is consistent with the taskId value, the split is assigned to the 
corresponding task, which will not cause the double allocation of files. It was 
checked in unit tests.
   
   So that's why I deleted this line of code. If I need to keep it, I can try 
to redesign the code flow. If you have any questions, please contact me in 
time. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to