[GitHub] spark issue #15407: [SPARK-17841][STREAMING][KAFKA] drain commitQueue

2016-10-17 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15407 This doesn't affect correctness (only the highest offset for a given partition is used in any case), just memory leaks. I'm not sure what a good way to unit test memory leaks is, short

[GitHub] spark pull request #15504: [SPARK-17812][SQL][KAFKA] Assign and specific sta...

2016-10-16 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15504#discussion_r83551653 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -232,6 +232,42 @@ private[kafka010] case class

[GitHub] spark pull request #15504: [SPARK-17812][SQL][KAFKA] Assign and specific sta...

2016-10-15 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15504 [SPARK-17812][SQL][KAFKA] Assign and specific startingOffsets for structured stream ## What changes were proposed in this pull request? startingOffsets takes specific per-topicpartition

[GitHub] spark issue #15397: [SPARK-17834][SQL]Fetch the earliest offsets manually in...

2016-10-13 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15397 LGTM, thanks for talking it through --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #15397: [SPARK-17834][SQL]Fetch the earliest offsets manu...

2016-10-13 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15397#discussion_r83289086 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -270,7 +281,7 @@ private[kafka010] case class

[GitHub] spark pull request #15397: [SPARK-17834][SQL]Fetch the earliest offsets manu...

2016-10-13 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15397#discussion_r83289072 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -256,8 +269,6 @@ private[kafka010] case class

[GitHub] spark issue #15397: [SPARK-17834][SQL]Fetch the earliest offsets manually in...

2016-10-12 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15397 My main point is that whoever implements SPARK-17812 is going to have to deal with the issue shown in SPARK-17782, which means much of this patch is going to need to be changed anyway

[GitHub] spark pull request #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race co...

2016-10-12 Thread koeninger
Github user koeninger closed the pull request at: https://github.com/apache/spark/pull/15387 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #15401: [SPARK-17782][STREAMING][KAFKA] alternative eliminate ra...

2016-10-12 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15401 @zsxwing so poll is only called in consumer strategy in situations in which starting offsets have been provided, and seek is called immediately thereafter for those offsets. What is the specific

[GitHub] spark pull request #15442: [SPARK-17853][STREAMING][KAFKA][DOC] make it clea...

2016-10-11 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15442 [SPARK-17853][STREAMING][KAFKA][DOC] make it clear that reusing group.id is bad ## What changes were proposed in this pull request? Documentation fix to make it clear that reusing group

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-11 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r82857280 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-10 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 If you're worried about it then accept the alternative PR I linked. On Sun, Oct 9, 2016 at 11:37 PM, Shixiong Zhu <notificati...@github.com> wrote: > During the

[GitHub] spark issue #15397: [SPARK-17834][SQL]Fetch the earliest offsets manually in...

2016-10-10 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15397 Look at the poll/seek implementation in the DStream's subscribe and subscribe pattern when user offsets are provided, i.e. the problem that triggered this ticket to begin with. You're going

[GitHub] spark pull request #15407: [SPARK-17841][STREAMING][KAFKA] drain commitQueue

2016-10-09 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15407 [SPARK-17841][STREAMING][KAFKA] drain commitQueue ## What changes were proposed in this pull request? Actually drain commit queue rather than just iterating it ## How

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 Let me know if you guys like that alternative PR better --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #15401: [SPARK-17782][STREAMING][KAFKA] alternative elimi...

2016-10-07 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15401 [SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice ## What changes were proposed in this pull request? Alternative approach to https://github.com/apache

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 If the concern is TD's comment, "Future calls to {@link #poll(long)} will not return any records from these partitions until they have been resumed using {@link #resume(Colle

[GitHub] spark issue #15397: [SPARK-17834][SQL]Fetch the earliest offsets manually in...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15397 How is this going to work with assign? It seems like it's just avoiding the problem, not fixing it. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 Poll also isn't going to return you just messages for a single topicpartition, so to do what you're suggesting you'd have to go through all the messages and do additional processing

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 You dont want poll consuming messages, its not just about offset correctness, the driver shouldnt be spending time or bandwidth doing that. What is the substantive concern

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 I set auto commit to false, and still recreated the test failure. That makes sense to me, consumer position should still be getting updated in memory even if it isn't saved to storage

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 I'm not going to say anything is impossible, which is the point of the assert. If it does somehow happen, it will be at start, so should be obvious. The whole poll 0 / pause thing

[GitHub] spark issue #15367: [SPARK-17346][SQL][test-maven]Add Kafka source for Struc...

2016-10-06 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15367 Does backporting reduce the likelihood of change if user feedback indicates we got it wrong? My technical concerns were largely addressed, that's my big remaining organizational concern

[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...

2016-10-06 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15355 @zsxwing good eye, thanks. It's not that auto.offset.reset.earliest doesn't work, it's that there's a potential race condition that poll gets called twice slowly enough for consumer position

[GitHub] spark pull request #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race co...

2016-10-06 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15387 [SPARK-17782][STREAMING][KAFKA] eliminate race condition of poll twice ## What changes were proposed in this pull request? Kafka consumers can't subscribe or maintain heartbeat without

[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-10-06 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15132 @jnadler Have you had a chance to try this out and see whether it addresses your issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...

2016-10-04 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15355 I have generally been unable to reproduce these kinds of test failures on my local environment, and don't have access to the build server, so trying fix without repro is pretty much shooting

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81642196 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,252 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81623277 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -0,0 +1,185 @@ +--- +layout: global +title: Structured Streaming + Kafka

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r81622511 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,281 @@ +/* + * Licensed

[GitHub] spark issue #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for structure...

2016-10-03 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15307 @tdas I wasn't asking if this patch was intended to implement rate limiting, I was asking if it was intended to be used to support the implementation of rate limiting in the future. In other

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81615342 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -185,25 +203,36 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r8163 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,252 @@ +/* + * Licensed

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81590359 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,252 @@ +/* + * Licensed

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81584577 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -273,8 +304,14 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81582674 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -185,25 +203,36 @@ class StreamExecution

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-29 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 > It would be nice to be able to do something other than earliest/latest. That's what Assign and the starting offset arguments to the Subscribe strateg

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-27 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 Ok, so this kind of thing is why I was concerned about the copy, paste, randomly change things approach to developing this module. > (5) Topics are deleted when a Spark job is runi

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-27 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80793355 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed

[GitHub] spark issue #13998: [SPARK-12177][Streaming][Kafka] limit api surface area

2016-09-26 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/13998 I ran that test 100 times locally w/out error... you have any suggestions on repro? On Mon, Sep 26, 2016 at 6:40 PM, Cody Koeninger <c...@koeninger.org> wrote: >

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80617904 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-26 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 Ok, finished a line-by-line compare + comment. The biggest thing I'm having trouble reconciling is the stated emphasis on limiting user options in order to give guarantees, yet throwing

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80617145 --- Diff: external/kafka-0-10-sql/pom.xml --- @@ -0,0 +1,82 @@ + + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80617036 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceRDD.scala --- @@ -0,0 +1,163 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80616878 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceRDD.scala --- @@ -0,0 +1,163 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80616386 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceRDD.scala --- @@ -0,0 +1,163 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80616098 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceRDD.scala --- @@ -0,0 +1,163 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80615842 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80615307 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80614899 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80614481 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80613358 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80613155 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80612809 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80612619 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,366 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80612293 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,366 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80611556 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,366 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80611465 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,366 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80610814 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,366 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80610748 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,366 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80610562 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,366 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80610210 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,366 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80609930 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,366 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80609718 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,366 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80609252 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,366 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80609099 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,366 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80607976 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80607882 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80607795 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80607406 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80606631 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -149,6 +160,14 @@ private[kafka010] case class

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r80605915 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed

[GitHub] spark issue #13998: [SPARK-12177][Streaming][Kafka] limit api surface area

2016-09-26 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/13998 Sure I'll give it another look On Sep 26, 2016 3:46 PM, "Tathagata Das" <notificati...@github.com> wrote: > @koeninger <https://github.com/koeninger&

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-23 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 Source.getOffset returns an sql Offset, not a kafka offset. At this point the current plan is to remove the ordering requirement for sql Offset, which makes the whole discussion

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-23 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 We aren't comparing ordering of offsets across partitions, and I don't think that was ever in consideration. At this point the most likely candidate for the global ordering is implicit

[GitHub] spark issue #15207: [SPARK-17643] Remove comparable requirement from Offset

2016-09-22 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15207 LGTM. You probably already checked this, but FWIW I verified the kafka topic deletion test does pass once this is merged: https://github.com/koeninger/spark-1/tree/kafka-source

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 > I agree that if/when we add that ability to add existing partitions midstream we'd probably need to add two offsets in to the SQL offset for new partitions. It's not just exist

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 @tdas I think as long as marmbrus' PR to remove comparable from the interface works for sane variations of subscription changes it's the best way to go. I'm honestly fine with someone getting

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 > For streaming you already know what the global order is, because you know when you asked for A and B. I agree that we should probably remove the comparable requirement from Offset in fa

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 @tdas moving this conversation back to the PR that's linked from the public jira > yeah, i am trying to figure out all the options and write up something to so that we are cl

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 This is pretty much the fundamental issue. Kafka offsets alone aren't capable of meeting the SQL Offset interface as defined. I think that means the Offset interface needs

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-21 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 > I'd want to see some test cases though that show why the current implementation is wrong from an end-user perspective if it needs to block merging initial kafka support.

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-20 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 > We are not giving the developer the option to manually configure a consumer in this way for this PR precisely because I don't think we can while still maintaining the semantics that structu

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-20 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 My fork is not following auto.offset.reset, it's following what the (potentially user-provided) consumer does when it sees a new partition. Maybe that's auto.offset.reset, maybe it's

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-20 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 Absent a direct answer, I'm going to read that as "Yes I've made up my mind, no I will not consider PRs to the contrary." If you want pointers to what I think the correct impl

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-20 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r79680339 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,446 @@ +/* + * Licensed

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-20 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 You should not be assuming 0 for a starting offset for partitions you've just learned about. You should be asking the underlying driver consumer what its position is. This is yet another

[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-09-19 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15132 I personally wouldn't encourage people to use receivers unless they had a very specific reason to. The kafka-0-10 submodule doesn't even have a way to create receiver based streams, so I'm

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-19 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 I'm not concerned about people deleting partitions before messages have been processed, because they can take care of that problem themselves, by not deleting things until consuming has

[GitHub] spark pull request #15132: [SPARK-17510][STREAMING][KAFKA] config max rate o...

2016-09-17 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15132 [SPARK-17510][STREAMING][KAFKA] config max rate on a per-partition basis ## What changes were proposed in this pull request? Allow configuration of max rate on a per-topicpartition basis

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-15 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 > I pushed for this code to be copied rather than refactored because I think this is the right direction long term. While it is nice to minimize inter-project dependencies, that is not rea

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-15 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r79094763 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,446 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-15 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r79093876 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,446 @@ +/* + * Licensed

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-15 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15102#discussion_r79092976 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -0,0 +1,446 @@ +/* + * Licensed

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-15 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 * This already does depend on most of the existing Kafka DStream implementation. The fact that most of it was copied wholesale proves that. If you're just saying that you don't want it to have

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-14 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 Just from a very brief look, this duplicates egregious amounts of code from the existing Kafka submodule and doesn't handle offset ordering correctly when topicpartitions change. I'd -1

[GitHub] spark issue #14981: [SPARK-17418] Remove Kinesis artifacts from Spark releas...

2016-09-14 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/14981 Yeah, I don't know of an easier workaround. My understanding is also that the asf concern is tied to distribution, so not publishing to maven should be sufficient. On Sep 14

[GitHub] spark issue #14981: [SPARK-17418] Remove Kinesis artifacts from Spark releas...

2016-09-09 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/14981 Yeah, if it's cleaner to remove the kinesis-asl-assembly module I don't think it's a serious hardship to users. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #14981: [SPARK-17418] Remove Kinesis artifacts from Spark releas...

2016-09-09 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/14981 Isn't that mostly down to whether someone wants to just put the whole assembly on their classpath, vs install the project and depend on it in their build tool? I can see why someone would want

[GitHub] spark issue #14981: [SPARK-17418] Remove Kinesis artifacts from Spark releas...

2016-09-09 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/14981 My 2 cents are that we should make things as easy as possible for users, within the bounds of what ASF legal is willing to tolerate ;) Which probably means having it exist, but not published

<    1   2   3   4   5   6   7   >