Bijay Singh Bisht created SPARK-10734:
-----------------------------------------
Summary: DirectKafkaInputDStream uses the OffsetRequest.LatestTime
to find the latest offset, however using the batch time would be more
desireable.
Key: SPARK-10734
URL: https://issues.apache.org/jira/browse/SPARK-10734
Project: Spark
Issue Type: Improvement
Reporter: Bijay Singh Bisht
DirectKafkaInputDStream uses the OffsetRequest.LatestTime to find the latest
offset, however since OffsetRequest.LatestTime is a relative thing, its depends
on when the batch is scheduled. One would imagine that given an input data set
the data in the batches should be predictable, irrespective of the system
conditions. Using the batch time implies that the stream processing will have
the same batches irrespective of whether when the processing was started and
the load conditions on the system.
This along with [SPARK-10732] provides for a nice regression scenarios.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]