[
https://issues.apache.org/jira/browse/SPARK-18682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Armbrust updated SPARK-18682:
-------------------------------------
Assignee: Tyson Condie
> Batch Source for Kafka
> ----------------------
>
> Key: SPARK-18682
> URL: https://issues.apache.org/jira/browse/SPARK-18682
> Project: Spark
> Issue Type: New Feature
> Components: SQL, Structured Streaming
> Reporter: Michael Armbrust
> Assignee: Tyson Condie
>
> Today, you can start a stream that reads from kafka. However, given kafka's
> configurable retention period, it seems like sometimes you might just want to
> read all of the data that is available now. As such we should add a version
> that works with {{spark.read}} as well.
> The options should be the same as the streaming kafka source, with the
> following differences:
> - {{startingOffsets}} should default to earliest, and should not allow
> {{latest}} (which would always be empty).
> - {{endingOffsets}} should also be allowed and should default to {{latest}}.
> the same assign json format as {{startingOffsets}} should also be accepted.
> It would be really good, if things like {{.limit\(n\)}} were enough to
> prevent all the data from being read (this might just work).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]