Sean Owen resolved SPARK-23397.
    Resolution: Not A Problem

It's the part in "foreachRDD" that gets executed at each batch; one answer is 
to make sure you don't re-execute logic at each batch that you don't need to. 

But you're just in general saying that sometimes complex operations take a long 
time, or, that you'd prefer a certain operation were faster. Neither relates to 
the original issue here, about scheduling.

> Scheduling delay causes Spark Streaming to miss batches.
> --------------------------------------------------------
>                 Key: SPARK-23397
>                 URL: https://issues.apache.org/jira/browse/SPARK-23397
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.2.1
>            Reporter: Shahbaz Hussain
>            Priority: Major
> * For Complex Spark (Scala) based D-Stream based applications ,which requires 
> creating Ex: 40 Jobs for every batch ,its been observed that ,batches does 
> not get created on the specific time ,ex: if i started a Spark Streaming 
> based application with batch interval as 20 seconds and application is 
> creating 40 odd Jobs ,observe the next batch does not create 20 seconds later 
> than previous job creation time.
>  * This is due to the fact that Job Creation is Single Threaded, if Job 
> Creation delay is greater than Batch Interval time ,batch execution misses 
> its schedule.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to