Bijay Singh Bisht created SPARK-10734:
-----------------------------------------

             Summary: DirectKafkaInputDStream uses the OffsetRequest.LatestTime 
to find the latest offset, however using the batch time would be more 
desireable.
                 Key: SPARK-10734
                 URL: https://issues.apache.org/jira/browse/SPARK-10734
             Project: Spark
          Issue Type: Improvement
            Reporter: Bijay Singh Bisht


DirectKafkaInputDStream uses the OffsetRequest.LatestTime to find the latest 
offset, however since OffsetRequest.LatestTime is a relative thing, its depends 
on when the batch is scheduled. One would imagine that given an input data set 
the data in the batches should be predictable, irrespective of the system 
conditions. Using the batch time implies that the stream processing will have 
the same batches irrespective of whether when the processing was started and 
the load conditions on the system.

This along with [SPARK-10732] provides for a nice regression scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to