[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15487310#comment-15487310
]
Cody Koeninger commented on SPARK-15406:
----------------------------------------
So we can do the easiest thing possible for 2.0.1, which in this case is
probably taking my patch and replacing the hardcoded string, string for message
key/value with a byte array.
But if we do that, are you actually willing to allow the changes necessary for
people to get functionality comparable to the existing DStream / RDD interface
for 2.x.x ? Or is this going to be another situation where you say "well, now
people are using this interface, so we're stuck with it"?
I'm worried about shoehorning part of the existing Kafka functionality into the
SourceProvider interface without considering changes to that interface,
especially if Kafka or Kafka-like sources are the only thing intended to work
with it. For instance, why we are sticking ourselves with a stringly-typed
interface and all the problems that causes, when Reynold already admitted that
Python is always going to be a second-class citizen with regard to streaming
(e.g. SPARK-16534)
> Structured streaming support for consuming from Kafka
> -----------------------------------------------------
>
> Key: SPARK-15406
> URL: https://issues.apache.org/jira/browse/SPARK-15406
> Project: Spark
> Issue Type: New Feature
> Reporter: Cody Koeninger
>
> Structured streaming doesn't have support for kafka yet. I personally feel
> like time based indexing would make for a much better interface, but it's
> been pushed back to kafka 0.10.1
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]