[jira] [Commented] (SPARK-15406) Structured streaming support for consuming from Kafka

Cody Koeninger (JIRA) Tue, 13 Sep 2016 11:03:03 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15487930#comment-15487930
 ]


Cody Koeninger commented on SPARK-15406:
----------------------------------------

Specific examples:

Kafka has a type for a key, and a type for a value, with deserializers 
corresponding to those types.  In other words, I need to construct a 
KafkaRDD[K, V] in the getBatch method.  How do I communicate a parameterized 
type for K and V through this interface?

ConsumerStrategy allows for user-defined implementations.  This is necessary 
because getting kafka consumers set up correctly is stateful, and one size 
doesn't fit all.  How do I communicate a user-defined ConsumerStrategy through 
this interface?

More prosaically, telling Scala end users that the way they need to communicate 
a mapping from topicpartition objects to starting offsets is to pass in a json 
string... if I was evaluating a library new from an outside perspective and saw 
that, I'd say nope and walk away.  Any language that can handle json can handle 
nested maps, so at the very least that seems like a better lowest common 
denominator than string.

Again, I'm not trying to be obstructionist here, I am generally in favor of 
doing the simplest thing that works.  But I really have very little confidence 
that doing the expedient thing now isn't going to prevent us from doing the 
right thing later.

> Structured streaming support for consuming from Kafka
> -----------------------------------------------------
>
>                 Key: SPARK-15406
>                 URL: https://issues.apache.org/jira/browse/SPARK-15406
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Cody Koeninger
>
> This is the parent JIRA to track all the work for the building a Kafka source 
> for Structured Streaming. Here is the design doc for an initial version of 
> the Kafka Source.
> https://docs.google.com/document/d/19t2rWe51x7tq2e5AOfrsM9qb8_m7BRuv9fel9i0PqR8/edit?usp=sharing
> ================== Old description =========================
> Structured streaming doesn't have support for kafka yet.  I personally feel 
> like time based indexing would make for a much better interface, but it's 
> been pushed back to kafka 0.10.1
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15406) Structured streaming support for consuming from Kafka

Reply via email to