[ 
https://issues.apache.org/jira/browse/SPARK-14597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Aggarwal updated SPARK-14597:
------------------------------------
    Description: 
While looking to tune our streaming application, the piece of info we were 
looking for was actual processing time per batch. The 
StreamingListener.onBatchCompleted event provides a BatchInfo object that 
provided this information. It provides the following data
 - processingDelay
 - schedulingDelay
 - totalDelay
 - Submission Time
 The above are essentially calculated from the streaming JobScheduler clocking 
the processingStartTime and processingEndTime for each JobSet. Another metric 
available is submissionTime which is when a Jobset was put on the Streaming 
Scheduler's Queue. 
 
So we took processing delay as our actual processing time per batch. However to 
maintain a stable streaming application, we found that the our batch interval 
had to be a little less than DOUBLE of the processingDelay metric reported. (We 
are using a DirectKafkaInputStream). On digging further, we found that 
processingDelay is only clocking time spent in the ForEachRDD closure of the 
Streaming application and that JobGenerator's graph.generateJobs 
(https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala#L248)
 method takes a significant more amount of time.

 Thus a true reflection of processing time is
 a - Time spent in JobGenerator's Job Queue (JobGeneratorQueueDelay)
 b - Time spent in JobGenerator's graph.generateJobs (JobSetCreationDelay)
 c - Time spent in JobScheduler Queue for a Jobset (existing schedulingDelay 
metric)
 d - Time spent in Jobset's job run (existing processingDelay metric)
 
 Additionally a JobGeneratorQueue delay (#a) could be due to either 
graph.generateJobs taking longer than batchInterval or other JobGenerator 
events like checkpointing adding up time. Thus it would be beneficial to report 
time taken by the checkpointing Job as well

  was:
While looking to tune our streaming application, the piece of info we were 
looking for was actual processing time per batch. The 
StreamingListener.onBatchCompleted event provides a BatchInfo object that 
provided this information. It provides the following data
 - processingDelay
 - schedulingDelay
 - totalDelay
 - Submission Time
 The above are essentially calculated from the streaming JobScheduler clocking 
the processingStartTime and processingEndTime for each JobSet. Another metric 
available is submissionTime which is when a Jobset was put on the Streaming 
Scheduler's Queue. 
 
So we took processing delay as our actual processing time per batch. However to 
maintain a stable streaming application, we found that the our batch interval 
had to be a little less than DOUBLE of the processingDelay metric reported. (We 
are using a DirectKafkaInputStream). On digging further, we found that 
processingDelay is only clocking time spent in the ForEachRDD closure of the 
Streaming application and that JobGenerator's graph.generateJobs 
(https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala#L248)
 method takes a significant more amount of time.

 Thus a true reflection of processing time is
 a - Time spent in JobGenerator's Job Queue (jobGenerator scheduling delay or 
JobGeneratorQueue delay)
 b - Time spent in JobGenerator's graph.generateJobs (JobSetCreation delay)
 c - Time spent in JobScheduler Queue for a Jobset (existing schedulingDelay 
metric)
 d - Time spent in Jobset's job run (existing processingDelay metric)
 
 Additionally a JobGeneratorQueue delay (#a) could be due to either 
graph.generateJobs taking longer than batchInterval or other JobGenerator 
events like checkpointing adding up time. Thus it would be beneficial to report 
time taken by the checkpointing Job as well


> Streaming Listener timing metrics should include time spent in JobGenerator's 
> graph.generateJobs
> ------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-14597
>                 URL: https://issues.apache.org/jira/browse/SPARK-14597
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, Streaming
>    Affects Versions: 1.6.1, 2.0.0
>            Reporter: Sachin Aggarwal
>            Priority: Minor
>
> While looking to tune our streaming application, the piece of info we were 
> looking for was actual processing time per batch. The 
> StreamingListener.onBatchCompleted event provides a BatchInfo object that 
> provided this information. It provides the following data
>  - processingDelay
>  - schedulingDelay
>  - totalDelay
>  - Submission Time
>  The above are essentially calculated from the streaming JobScheduler 
> clocking the processingStartTime and processingEndTime for each JobSet. 
> Another metric available is submissionTime which is when a Jobset was put on 
> the Streaming Scheduler's Queue. 
>  
> So we took processing delay as our actual processing time per batch. However 
> to maintain a stable streaming application, we found that the our batch 
> interval had to be a little less than DOUBLE of the processingDelay metric 
> reported. (We are using a DirectKafkaInputStream). On digging further, we 
> found that processingDelay is only clocking time spent in the ForEachRDD 
> closure of the Streaming application and that JobGenerator's 
> graph.generateJobs 
> (https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala#L248)
>  method takes a significant more amount of time.
>  Thus a true reflection of processing time is
>  a - Time spent in JobGenerator's Job Queue (JobGeneratorQueueDelay)
>  b - Time spent in JobGenerator's graph.generateJobs (JobSetCreationDelay)
>  c - Time spent in JobScheduler Queue for a Jobset (existing schedulingDelay 
> metric)
>  d - Time spent in Jobset's job run (existing processingDelay metric)
>  
>  Additionally a JobGeneratorQueue delay (#a) could be due to either 
> graph.generateJobs taking longer than batchInterval or other JobGenerator 
> events like checkpointing adding up time. Thus it would be beneficial to 
> report time taken by the checkpointing Job as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to