JeremyXin commented on code in PR #8453:
URL: https://github.com/apache/seatunnel/pull/8453#discussion_r1905566166
##########
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/source/split/FileSourceSplitEnumerator.java:
##########
@@ -91,13 +93,14 @@ private void assignSplit(int taskId) {
ArrayList<FileSourceSplit> currentTaskSplits = new ArrayList<>();
if (context.currentParallelism() == 1) {
// if parallelism == 1, we should assign all the splits to reader
- currentTaskSplits.addAll(pendingSplit);
+ currentTaskSplits.addAll(allSplit);
} else {
- // if parallelism > 1, according to hashCode of split's id to
determine whether to
+ // if parallelism > 1, according to polling strategy to determine
whether to
// allocate the current task
- for (FileSourceSplit fileSourceSplit : pendingSplit) {
+ assignCount.set(0);
+ for (FileSourceSplit fileSourceSplit : allSplit) {
int splitOwner =
- getSplitOwner(fileSourceSplit.splitId(),
context.currentParallelism());
+ getSplitOwner(assignCount.getAndIncrement(),
context.currentParallelism());
Review Comment:
I would like to consult you about the circumstances in which repetitive file
reading is caused by ParallelSource. Could you please explain in detail? I will
try to solve it by reproducing. So far I've written test cases and found
nothing like this.
In addition, I would like to ask you if you have any modification opinions,
thanks for your review.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]