[
https://issues.apache.org/jira/browse/SPARK-10734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bijay Singh Bisht updated SPARK-10734:
--------------------------------------
Component/s: Input/Output
> DirectKafkaInputDStream uses the OffsetRequest.LatestTime to find the latest
> offset, however using the batch time would be more desireable.
> -------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-10734
> URL: https://issues.apache.org/jira/browse/SPARK-10734
> Project: Spark
> Issue Type: Improvement
> Components: Input/Output
> Reporter: Bijay Singh Bisht
>
> DirectKafkaInputDStream uses the OffsetRequest.LatestTime to find the latest
> offset, however since OffsetRequest.LatestTime is a relative thing, its
> depends on when the batch is scheduled. One would imagine that given an input
> data set the data in the batches should be predictable, irrespective of the
> system conditions. Using the batch time implies that the stream processing
> will have the same batches irrespective of whether when the processing was
> started and the load conditions on the system.
> This along with [SPARK-10732] provides for a nice regression scenarios.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]