[jira] [Updated] (SPARK-18682) Batch Source for Kafka

Michael Armbrust (JIRA) Fri, 13 Jan 2017 13:22:48 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-18682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael Armbrust updated SPARK-18682:
-------------------------------------
    Assignee: Tyson Condie

> Batch Source for Kafka
> ----------------------
>
>                 Key: SPARK-18682
>                 URL: https://issues.apache.org/jira/browse/SPARK-18682
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL, Structured Streaming
>            Reporter: Michael Armbrust
>            Assignee: Tyson Condie
>
> Today, you can start a stream that reads from kafka.  However, given kafka's 
> configurable retention period, it seems like sometimes you might just want to 
> read all of the data that is available now.  As such we should add a version 
> that works with {{spark.read}} as well.
> The options should be the same as the streaming kafka source, with the 
> following differences:
>  - {{startingOffsets}} should default to earliest, and should not allow 
> {{latest}} (which would always be empty).
>  - {{endingOffsets}} should also be allowed and should default to {{latest}}. 
> the same assign json format as {{startingOffsets}} should also be accepted.
> It would be really good, if things like {{.limit\(n\)}} were enough to 
> prevent all the data from being read (this might just work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-18682) Batch Source for Kafka

Reply via email to