advancedxy commented on code in PR #494:
URL: https://github.com/apache/incubator-uniffle/pull/494#discussion_r1072085066
##########
server/src/main/java/org/apache/uniffle/server/ShuffleTaskManager.java:
##########
@@ -574,11 +584,20 @@ public void removeResources(String appId) {
new AppPurgeEvent(appId, getUserByAppId(appId), new
ArrayList<>(shuffleToCachedBlockIds.keySet()))
);
}
+ if (shuffleTaskInfo.hasHugePartition()) {
+ ShuffleServerMetrics.gaugeAppWithHugePartitionNum.dec();
+
ShuffleServerMetrics.gaugeHugePartitionNum.dec(shuffleTaskInfo.getHugePartitionSize());
+ }
LOG.info("Finish remove resource for appId[" + appId + "] cost " +
(System.currentTimeMillis() - start) + " ms");
}
public void refreshAppId(String appId) {
- shuffleTaskInfos.computeIfAbsent(appId, x -> new
ShuffleTaskInfo()).setCurrentTimes(System.currentTimeMillis());
+ shuffleTaskInfos.computeIfAbsent(
+ appId,
+ x -> {
+ ShuffleServerMetrics.counterTotalAppNum.inc();
Review Comment:
> This might be inaccurate
This problem is not addressed.
`refreshAppId` could be called by multiple calls. For example, a spark app
finishes an stage(with shuffle), then some local computing then submit new
stage(with shuffle). The shuffle server will account this app multiple times.
I can think of two proposals to fix this problem:
1. Modify the `ShuffleRegisterRequest` in proto to include a `firstShuffle`
boolean field to indicate this is the first shuffle register, mapping to the
app.
2. Shuffle server maintains another data structure to maintain which appIds
has been seen. But it's not evicted until the configured size is reached, let's
say 100_000?
Both two proposals require the metric inc should be happened in
`registerShuffle` call.
The first one should fix the problem permanently, but require bigger effort.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]