Sachin Aggarwal created SPARK-14597:
---------------------------------------
Summary: Streaming Listener timing metrics should include time
spent in JobGenerator's graph.generateJobs
Key: SPARK-14597
URL: https://issues.apache.org/jira/browse/SPARK-14597
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.6.1, 2.0.0
Reporter: Sachin Aggarwal
Priority: Minor
While looking to tune our streaming application, the piece of info we were
looking for was actual processing time per batch. The
StreamingListener.onBatchCompleted event provides a BatchInfo object that
provided this information. It provides the following data
- processingDelay
- schedulingDelay
- totalDelay
- Submission Time
The above are essentially calculated from the streaming JobScheduler clocking
the processingStartTime and processingEndTime for each JobSet. Another metric
available is submissionTime which is when a Jobset was put on the Streaming
Scheduler's Queue.
So we took processing delay as our actual processing time per batch. However to
maintain a stable streaming application, we found that the our batch interval
had to be a little less than DOUBLE of the processingDelay metric reported. (We
are using a DirectKafkaInputStream). On digging further, we found that
processingDelay is only clocking time spent in the ForEachRDD closure of the
Streaming application and that JobGenerator's graph.generateJobs
(https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala#L248)
method takes a significant more amount of time.
Thus a true reflection of processing time is
a - Time spent in JobGenerator's Job Queue (jobGenerator scheduling delay or
JobGeneratorQueue delay)
b - Time spent in JobGenerator's graph.generateJobs
(generateJobProcessingDelay)
c - Time spent in JobScheduler Queue for a Jobset (existing schedulingDelay
metric)
d - Time spent in Jobset's job run (existing processingDelay metric)
Additionally a JobGeneratorQueue delay (#a) could be due to either
graph.generateJobs taking longer than batchInterval or other JobGenerator
events like checkpointing adding up time. Thus it would be beneficial to report
time taken by the checkpointing Job as well
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]