[ 
https://issues.apache.org/jira/browse/FLINK-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333330#comment-14333330
 ] 

ASF GitHub Bot commented on FLINK-1501:
---------------------------------------

Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/421#issuecomment-75545711
  
    Thanks everybody for the positive feedback!
    > What does the OS load mean? It would be really awesome to show the CPU 
load, too. I think this is a helpful indicator.
    
    On the OS load: 
http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages
    
    I totally agree that the OS load is not a very good metric for our 
purposes. 
    The reason why I didn't try to get better metrics for this is that I didn't 
want to play "ugly tricks" to get them.
    My code is getting the metrics only via the management beans. The 
`OperatingSystemMXBean` is only exposing the load and the number of processor 
cores:
    
http://docs.oracle.com/javase/7/docs/api/java/lang/management/OperatingSystemMXBean.html#getSystemLoadAverage()
    There is another implementation of the `OperatingSystemMXBean` 
(https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html)
 which is also exposing stuff like `getProcessCpuLoad()`.
    But the availability of this management bean depends on the used JVM 
version etc.
    
    Another way to get the CPU load of the process would be parsing the output 
of `ps` or `top`. But that also falls into the category of "ugly tricks".
    I think we should aim for getting those metrics into the system as well. 
Adding them is a matter of registering another Gauge in the TaskManager's 
metrics registry and visualizing the JSON output.
    I hope that these kinds of refinements are done by external contributors.
    Once this PR has been merged, I'll file a JIRA to improve the CPU 
monitoring.
    
    >What are the current options for showing the detailed metrics? I see a 
"show 3 TMs" and "show all TMs" button in the screenshot? Can you select which 
three to show?
    
    No, you cannot choose which three TMs. 
    I added these buttons because starting a large Flink cluster (50+ nodes) 
will cause quite some load on the browser updating all the charts. Usually its 
sufficient to see monitor the load of a few TMs only, because they are doing 
mostly the same (ideally).
    But I agree that there is room for improvement.
    
    > How about we open a document and sketch the design of the monitoring and 
create smaller PRs to get there step-by-step.
    
    I totally agree that we should do small incremental improvements. 
    As I said in the PR description, the primary purpose of this PR is to get 
the basic monitoring infrastructure in place, how we present the stuff in the 
end is subject to further PRs.
    
    
    I have started working on the "per-job" monitoring and found that I have to 
change some details of this PR as well.
    Depending on my progress on the "per-job" monitoring I might contribute the 
changes here together with the "per-job" metrics. If I don't have enough time 
this week to open a PR for the per job metrics this week, I'll merge this 
change to master.


> Integrate metrics library and report basic metrics to JobManager web interface
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-1501
>                 URL: https://issues.apache.org/jira/browse/FLINK-1501
>             Project: Flink
>          Issue Type: Sub-task
>          Components: JobManager, TaskManager
>    Affects Versions: 0.9
>            Reporter: Robert Metzger
>            Assignee: Robert Metzger
>             Fix For: pre-apache
>
>
> As per mailing list, the library: https://github.com/dropwizard/metrics
> The goal of this task is to get the basic infrastructure in place.
> Subsequent issues will integrate more features into the system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to