xianjingfeng commented on PR #2022:
URL: 
https://github.com/apache/incubator-uniffle/pull/2022#issuecomment-2276060863

   > > > Like that the shuffle servers could report all its running 
applications back to coordinator via rpc calls? Just like how the heartbeat is 
made.
   > > 
   > > The amount of the data is a bit large. The data of an application would 
be as follow.
   > > 
   > > ```java
   > > public class ApplicationMetricsVo {
   > >   private long totalDataSize;
   > >   private long inMemoryDataSize;
   > >   private long onLocalFileDataSize;
   > >   private long onHadoopDataSize;
   > >   private Map<Integer, Map<Integer, Long>> partitionDataSizes;
   > >   private boolean existHugePartition;
   > > }
   > > ```
   > 
   > Hmmm, looks pretty compact to me. Except the partition data sizes, I don't 
think this info has much different than the heartbeat message. 
   > 
   > If you haven't finished all the related code, I would suggest to implement 
a new `ReportRunningApplications` service point in coordinator instead, which I 
think would be less code.
   > 
   > The shuffle server could still export its internal metrics/running 
applications via the Web interface.
   
   If this method is used, each shuffle server needs to report hundreds of 
application information and tens of thousands of partition information.This may 
cause performance issues for the coordinator.
   
   In general, we only need to check the information of a particular 
application, not all of them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to