[jira] [Updated] (SPARK-10734) DirectKafkaInputDStream uses the OffsetRequest.LatestTime to find the latest offset, however using the batch time would be more desireable.

Bijay Singh Bisht (JIRA) Mon, 21 Sep 2015 11:11:51 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-10734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Bijay Singh Bisht updated SPARK-10734:
--------------------------------------
    Component/s: Input/Output

> DirectKafkaInputDStream uses the OffsetRequest.LatestTime to find the latest 
> offset, however using the batch time would be more desireable.
> -------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10734
>                 URL: https://issues.apache.org/jira/browse/SPARK-10734
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output
>            Reporter: Bijay Singh Bisht
>
> DirectKafkaInputDStream uses the OffsetRequest.LatestTime to find the latest 
> offset, however since OffsetRequest.LatestTime is a relative thing, its 
> depends on when the batch is scheduled. One would imagine that given an input 
> data set the data in the batches should be predictable, irrespective of the 
> system conditions. Using the batch time implies that the stream processing 
> will have the same batches irrespective of whether when the processing was 
> started and the load conditions on the system.
> This along with [SPARK-10732] provides for a nice regression scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-10734) DirectKafkaInputDStream uses the OffsetRequest.LatestTime to find the latest offset, however using the batch time would be more desireable.

Reply via email to