RexXiong commented on code in PR #2373:
URL: https://github.com/apache/celeborn/pull/2373#discussion_r1914748980
##########
client/src/main/java/org/apache/celeborn/client/ShuffleClient.java:
##########
@@ -252,6 +257,8 @@ public abstract CelebornInputStream readPartition(
ExceptionMaker exceptionMaker,
ArrayList<PartitionLocation> locations,
ArrayList<PbStreamHandler> streamHandlers,
+ Map<String, Set<PushFailedBatch>> failedBatchSetMap,
Review Comment:
> For a non skewed stage, we handle this right
In a non-skewed stage, there is no need for this, as the reducer can read
data by map range, allowing for the deduplication of identical batches when the
reducer processes the entire dataset from the map task. However, in a skewed
stage, the reducer reads only partial data in chunks, which may originate from
all map tasks. In this scenario, identical batches may appear in different
chunks, making it difficult for the reducer to deduplicate them unless it is
aware of which batches shouldn't be read, that's why all map tasks should tell
LifecycleManager failedBatches which can't be read.
##########
client/src/main/java/org/apache/celeborn/client/ShuffleClient.java:
##########
@@ -252,6 +257,8 @@ public abstract CelebornInputStream readPartition(
ExceptionMaker exceptionMaker,
ArrayList<PartitionLocation> locations,
ArrayList<PbStreamHandler> streamHandlers,
+ Map<String, Set<PushFailedBatch>> failedBatchSetMap,
Review Comment:
> For a non skewed stage, we handle this right
In a non-skewed stage, there is no need for this, as the reducer can read
data by map range, allowing for the deduplication of identical batches when the
reducer processes the entire dataset from the map task. However, in a skewed
stage, the reducer reads only partial data in chunks, which may originate from
all map tasks. In this scenario, identical batches may appear in different
chunks, making it difficult for the reducer to deduplicate them unless it is
aware of which batches shouldn't be read, that's why all map tasks should tell
LifecycleManager failedBatches which can't be read.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]