SteNicholas opened a new pull request, #2561:
URL: https://github.com/apache/celeborn/pull/2561

   ### What changes were proposed in this pull request?
   
   Worker removes application active connection via clean thread pool to 
improvement performance of `cleanup`.
   
   ### Why are the changes needed?
   
   `cleanTaskQueue` has the situation that there is large backlog which causes 
delay in cleaning up expired shuffle keys and high CPU usage as follows:
   
   ```
   top - 16:09:14 up 52 days, 19:34,  0 users,  load average: 5.42, 8.37, 7.54
   Threads: 941 total,   2 running, 939 sleeping,   0 stopped,   0 zombie
   %Cpu(s):  3.8 us,  1.6 sy,  0.0 ni, 93.6 id,  0.7 wa,  0.0 hi,  0.3 si,  0.0 
st
   KiB Mem : 39410694+total,  7602132 free, 18303019+used, 20347460+buff/cache
   KiB Swap:        0 total,        0 free,        0 used. 19179446+avail Mem
   
     PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
     647 celeborn  20   0  0.245t 0.164t      0 R 95.7 44.7   1352:09 java
   ```
   ```
   "worker-expired-shuffle-cleaner" #263 daemon prio=5 os_prio=0 
tid=0x00007f06e9f6e000 nid=0x287 runnable [0x00007eecf15ef000]
      java.lang.Thread.State: RUNNABLE
           at 
scala.collection.convert.Wrappers$JMapWrapperLike$$anon$5.next(Wrappers.scala:288)
           at 
scala.collection.convert.Wrappers$JMapWrapperLike$$anon$5.next(Wrappers.scala:285)
           at scala.collection.Iterator.foreach(Iterator.scala:941)
           at scala.collection.Iterator.foreach$(Iterator.scala:941)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
           at scala.collection.IterableLike.foreach(IterableLike.scala:74)
           at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
           at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
           at 
org.apache.celeborn.service.deploy.worker.WorkerSource.removeAppActiveConnection(WorkerSource.scala:112)
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Cluster test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to