alnzng opened a new pull request, #1636:
URL: https://github.com/apache/samza/pull/1636

   # Symptom
   
   We have observed that some use cases used quasar(TensorFlow framework) to do 
model inference and this framework spawn child processes(non-JVM) to run 
TensorFlow serving. These child processes were using high CPU usage(200%) 
however their CPU usage can't be captured by the existing CPU usage metric 
`process-cpu-usage`
   
   # Cause
   
   The existing metric `process-cpu-usage` metric was designed for capturing 
the [CPU usage for the JVM 
process](https://samza.apache.org/learn/documentation/1.6.0/operations/monitoring.html)
 only, it can't count the child processes(especially for non-JVM processes) 
usage.
   
   # Changes
   
   - Reply on [oshi framwork](https://www.oshi.ooo/) to capture the CPU usage 
for the JVM process and all its child processes, and create a new metric to 
display the total CPU usage.
   # API Changes
   
   - Added a new metric `total-process-cpu-usage` in `SamzaContainerMetrics` 
which is similar with [how we provided `physical-memory-mb` 
metric](https://github.com/apache/samza/pull/1530)
   
   # Tests
   
   - Unit tests
   - Tested with `samza-hello-samza` and verify the metric data points
   ![Screen Shot 2022-10-25 at 10 23 58 
PM](https://user-images.githubusercontent.com/59407935/197942249-21dae598-7e18-4bb0-88e5-6a752ca49765.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to