[jira] [Updated] (SPARK-11308) Change spark streaming's job scheduler logic to ensuer guaranteed order of batch processing

Hyukjin Kwon (JIRA) Mon, 20 May 2019 21:48:07 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-11308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon updated SPARK-11308:
---------------------------------
    Labels: bulk-closed  (was: )

> Change spark streaming's job scheduler logic to ensuer guaranteed order of 
> batch processing
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-11308
>                 URL: https://issues.apache.org/jira/browse/SPARK-11308
>             Project: Spark
>          Issue Type: Improvement
>          Components: DStreams
>    Affects Versions: 1.5.1
>            Reporter: Renjie Liu
>            Priority: Major
>              Labels: bulk-closed
>
> In current implementation, spark streaming uses a thread pool to run jobs 
> generated in each time interval and orders are not guaranteed, i.e., if jobs 
> generated in time 1 takes time longer than the batch duration, jobs 2 will 
> begin to execute and the finish order is not guaranteed. This implementation 
> is not quite useful in practice since it may cost much more storage. For 
> example, when we do a word count in spark streaming, to be accurate we need 
> to store records for each batch rather than just word count in database to be 
> idempotent. But if the processing order of each batch is guaranteed, we just 
> need to store the last update time with word count in database to be 
> idempotent. Just simply set the thread pool size to 1 may cause the system to 
> be inefficient when there are more than one output streams.  This feature can 
> be implemented by giving each output stream a thread and jobs of each output 
> stream are executed in order.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-11308) Change spark streaming's job scheduler logic to ensuer guaranteed order of batch processing

Reply via email to