[ 
https://issues.apache.org/jira/browse/FLINK-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15604536#comment-15604536
 ] 

ASF GitHub Bot commented on FLINK-4888:
---------------------------------------

Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2683#discussion_r84846296
  
    --- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
    @@ -1828,6 +1828,33 @@ class JobManager(
         jobManagerMetricGroup.gauge[Long, Gauge[Long]]("numRunningJobs", new 
Gauge[Long] {
           override def getValue: Long = JobManager.this.currentJobs.size
         })
    +    jobManagerMetricGroup.gauge[Long, Gauge[Long]]("numFailedJobs", new 
Gauge[Long] {
    +      override def getValue: Long = {
    +         var failedJobs = 0
    +         val ourJobs = createJobStatusOverview()
    +         val future = (archive ? 
RequestJobsOverview.getInstance())(timeout)
    +         val archivedJobs : JobsOverview = Await.result(future, 
timeout).asInstanceOf[JobsOverview]
    +         failedJobs += ourJobs.getNumJobsFailed() + 
archivedJobs.getNumJobsFailed()
    +         failedJobs
    +    }})
    +    jobManagerMetricGroup.gauge[Long, Gauge[Long]]("numCancelledJobs", new 
Gauge[Long] {
    +      override def getValue: Long = {
    +         var cancelledJobs = 0
    +         val ourJobs = createJobStatusOverview()
    +         val future = (archive ? 
RequestJobsOverview.getInstance())(timeout)
    +         val archivedJobs : JobsOverview = Await.result(future, 
timeout).asInstanceOf[JobsOverview]
    +         cancelledJobs += ourJobs.getNumJobsCancelled() + 
archivedJobs.getNumJobsCancelled()
    +         cancelledJobs
    +    }})
    +    jobManagerMetricGroup.gauge[Long, Gauge[Long]]("numFinishedJobs", new 
Gauge[Long] {
    +      override def getValue: Long = {
    +         var finishedJobs = 0
    +         val ourJobs = createJobStatusOverview()
    +         val future = (archive ? 
RequestJobsOverview.getInstance())(timeout)
    +         val archivedJobs : JobsOverview = Await.result(future, 
timeout).asInstanceOf[JobsOverview]
    +         finishedJobs += ourJobs.getNumJobsFinished() + 
archivedJobs.getNumJobsFinished()
    +         finishedJobs
    +    }})
    --- End diff --
    
    Generally i would say no, since there is always the chance it may block for 
the full timeout duration. 
    So in this case, in theory, with the default timeout of 10 seconds, we 
could block the reporter thread for half a minute. Now this isn't very likely 
since we query the MemoryArchivist within the JM, but still.
    
    I'm just wondering whether it makes sense to add this metric; with FLIP-6 
around the corner, which will make it obsolete anyway.
    
    if we merge it I would like to see some shared object so that we don't do 
the same RPC call 3 times.


> instantiated job manager metrics missing important job statistics 
> ------------------------------------------------------------------
>
>                 Key: FLINK-4888
>                 URL: https://issues.apache.org/jira/browse/FLINK-4888
>             Project: Flink
>          Issue Type: Improvement
>          Components: Metrics
>    Affects Versions: 1.1.2
>            Reporter: Philipp von dem Bussche
>            Assignee: Philipp von dem Bussche
>            Priority: Minor
>
> A jobmanager is currently (only) instantiated with the following metrics: 
> taskSlotsAvailable, taskSlotsTotal, numRegisteredTaskManagers and 
> numRunningJobs. Important other metrics would be numFailedJobs, 
> numCancelledJobs and numFinishedJobs. Also to get parity between JobManager 
> metrics and whats available via the REST API it would be good to have these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to