[ https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15487930#comment-15487930 ]
Cody Koeninger commented on SPARK-15406: ---------------------------------------- Specific examples: Kafka has a type for a key, and a type for a value, with deserializers corresponding to those types. In other words, I need to construct a KafkaRDD[K, V] in the getBatch method. How do I communicate a parameterized type for K and V through this interface? ConsumerStrategy allows for user-defined implementations. This is necessary because getting kafka consumers set up correctly is stateful, and one size doesn't fit all. How do I communicate a user-defined ConsumerStrategy through this interface? More prosaically, telling Scala end users that the way they need to communicate a mapping from topicpartition objects to starting offsets is to pass in a json string... if I was evaluating a library new from an outside perspective and saw that, I'd say nope and walk away. Any language that can handle json can handle nested maps, so at the very least that seems like a better lowest common denominator than string. Again, I'm not trying to be obstructionist here, I am generally in favor of doing the simplest thing that works. But I really have very little confidence that doing the expedient thing now isn't going to prevent us from doing the right thing later. > Structured streaming support for consuming from Kafka > ----------------------------------------------------- > > Key: SPARK-15406 > URL: https://issues.apache.org/jira/browse/SPARK-15406 > Project: Spark > Issue Type: New Feature > Reporter: Cody Koeninger > > This is the parent JIRA to track all the work for the building a Kafka source > for Structured Streaming. Here is the design doc for an initial version of > the Kafka Source. > https://docs.google.com/document/d/19t2rWe51x7tq2e5AOfrsM9qb8_m7BRuv9fel9i0PqR8/edit?usp=sharing > ================== Old description ========================= > Structured streaming doesn't have support for kafka yet. I personally feel > like time based indexing would make for a much better interface, but it's > been pushed back to kafka 0.10.1 > https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org