RexXiong commented on code in PR #2373:
URL: https://github.com/apache/celeborn/pull/2373#discussion_r1881706196
##########
client/src/main/java/org/apache/celeborn/client/read/CelebornInputStream.java:
##########
@@ -65,26 +69,54 @@ public static CelebornInputStream create(
ExceptionMaker exceptionMaker,
MetricsCallback metricsCallback)
throws IOException {
- if (locations == null || locations.size() == 0) {
+ if (locations == null || locations.isEmpty()) {
return emptyInputStream;
} else {
- return new CelebornInputStreamImpl(
- conf,
- clientFactory,
- shuffleKey,
- locations,
- streamHandlers,
- attempts,
- attemptNumber,
- startMapIndex,
- endMapIndex,
- fetchExcludedWorkers,
- shuffleClient,
- appShuffleId,
- shuffleId,
- partitionId,
- exceptionMaker,
- metricsCallback);
+ // if startMapIndex > endMapIndex, means partition is skew partition.
+ // locations will split to sub-partitions with startMapIndex size.
+ boolean splitSkewPartitionWithoutMapRange =
Review Comment:
Better extract a util method for this
##########
client/src/main/java/org/apache/celeborn/client/read/CelebornInputStream.java:
##########
@@ -421,6 +509,15 @@ private PartitionReader createReader(
logger.debug("Create reader for location {}", location);
StorageInfo storageInfo = location.getStorageInfo();
+
+ int startChunkIndex = -1;
Review Comment:
Celeborn should not attempt to retry replication when encountering
SkewedPartitionRead. Therefore, we may need to introduce a new flag to
determine whether to disable SkewedPartitionRead when replication is enabled.
Additionally, if SkewedPartitionRead occurs, we should ignore the replica
partitions
##########
client/src/main/java/org/apache/celeborn/client/read/CelebornInputStream.java:
##########
@@ -443,7 +540,9 @@ private PartitionReader createReader(
endMapIndex,
fetchChunkRetryCnt,
fetchChunkMaxRetry,
- callback);
+ callback,
Review Comment:
We may support localPartitionReader & DfsPartitionReader for
SkewedPartitionRead later.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]