[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453412#comment-15453412
]
Ofir Manor commented on SPARK-15406:
------------------------------------
For me - structured streaming is currently all about real window operations
based on event time (fields in the event), not processing time (already in 2.0
with some limitations). In a future release it may also be about some new
sink-related features (managing exactly-once from Spark to relational databases
or HDFS, automatically doing upserts to databases).
So, I just want the same Kafka features as before - the value is the new
processing capabilities, it just happens that my source of real-time events is
Kafka,not Parquet files (as in 2.0).
I expect a couple of things. First, some basic config control like a pointer to
Kafka (bootstrap servers), one or more topics, optionally an existing consumer
group or an offset definition, optionally kerberised connection. I also expect
exactly-once processing from Kafka to Spark (including correctly recovering
after Spark node failure)
> Structured streaming support for consuming from Kafka
> -----------------------------------------------------
>
> Key: SPARK-15406
> URL: https://issues.apache.org/jira/browse/SPARK-15406
> Project: Spark
> Issue Type: New Feature
> Reporter: Cody Koeninger
>
> Structured streaming doesn't have support for kafka yet. I personally feel
> like time based indexing would make for a much better interface, but it's
> been pushed back to kafka 0.10.1
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]