FMX commented on code in PR #2549:
URL: https://github.com/apache/celeborn/pull/2549#discussion_r1680401405
##########
common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala:
##########
@@ -4864,6 +4866,23 @@ object CelebornConf extends Logging {
.booleanConf
.createWithDefault(false)
+ val CLIENT_CHUNK_PREFETCH_ENABLED: ConfigEntry[Boolean] =
+ buildConf("celeborn.client.chunk.prefetch.enabled")
+ .categories("client")
+ .doc("Whether to enable chunk prefetch when creating
CelebornInputStream.")
+ .version("0.6.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val CLIENT_INPUTSTREAM_CREATION_WINDOW: ConfigEntry[Int] =
+ buildConf("celeborn.client.inputStream.creation.window")
+ .categories("client")
+ .doc(s"Window size that CelebornShuffleReader pre-creates
CelebornInputStreams, for coalesced scenario" +
Review Comment:
Although this pr is suitable for coalesced scenarios, this is no limit for a
task that reads multiple locations. I think this might cause unexpected high
memory consumption.
##########
client-spark/spark-3/src/main/scala/org/apache/spark/shuffle/celeborn/CelebornShuffleReader.scala:
##########
@@ -214,46 +219,71 @@ class CelebornShuffleReader[K, C](
locations,
streamHandlers,
fileGroups.mapAttempts,
- metricsCallback)
+ metricsCallback,
+ chunkPrefetchEnabled)
+ streams.put(partitionId, inputStream)
} catch {
case e: IOException =>
logError(s"Exception caught when readPartition $partitionId!", e)
exceptionRef.compareAndSet(null, e)
- null
case e: Throwable =>
logError(s"Non IOException caught when readPartition
$partitionId!", e)
exceptionRef.compareAndSet(null, new CelebornIOException(e))
- null
}
- } else null
+ }
}
+ (startPartition until Math.min(
Review Comment:
Adding a limit that a task should prefetch to control the memory usage can
be better.
16 may not be suitable for all scenarios. And this config will be hacky to
tweak.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]