FMX commented on code in PR #2009:
URL:
https://github.com/apache/incubator-celeborn/pull/2009#discussion_r1366874319
##########
METRICS.md:
##########
@@ -95,6 +95,8 @@ Here is an example of grafana dashboard importing.
| DiskBuffer | worker | Disk buffers
are part of netty used memory, means data need to write to disk but haven't
been written to disk. |
| PausePushData | worker |
PausePushData means the count of worker stopped receiving data from client.
|
| PausePushDataAndReplicate | worker |
PausePushDataAndReplicate means the count of worker stopped receiving data from
client and other workers. |
+| ActiveShuffleSize | worker |
The active shuffle size of peer worker.
|
Review Comment:
Actually, this is the shuffle file size of a worker including master replica
and slave replica. So the "peer" is not suitable.
##########
METRICS.md:
##########
@@ -95,6 +95,8 @@ Here is an example of grafana dashboard importing.
| DiskBuffer | worker | Disk buffers
are part of netty used memory, means data need to write to disk but haven't
been written to disk. |
| PausePushData | worker |
PausePushData means the count of worker stopped receiving data from client.
|
| PausePushDataAndReplicate | worker |
PausePushDataAndReplicate means the count of worker stopped receiving data from
client and other workers. |
+| ActiveShuffleSize | worker |
The active shuffle size of peer worker.
|
+| ActiveShuffleFileCount | worker |
The active shuffle file count of peer worker.
|
Review Comment:
Ditto.
##########
worker/src/main/scala/org/apache/celeborn/service/deploy/worker/storage/StorageManager.scala:
##########
@@ -845,6 +845,9 @@ final private[worker] class StorageManager(conf:
CelebornConf, workerSource: Abs
fileInfos.values().asScala.map(_.values().asScala.map(_.getBytesFlushed).sum).sum
}
+ def getActiveShuffleFileCount(): Long = {
+ fileInfos.values().size()
Review Comment:
`private val fileInfos =
JavaUtils.newConcurrentHashMap[String, ConcurrentHashMap[String,
FileInfo]]()`
As you can see the the key is shuffleKey.
So this should be
`fileInfos.asScala.values.map(_.size()).sum`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]