[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72790044 The warning is for metadata.broker.list, since its not expected by the existing ConsumerConfig (its used by other config classes) Couldn't get subclassing

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72783141 Yeah, more importantly it's so defaults for things like connection timeouts match what kafka provides. It's possible to assign fake zookeeper.connect

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24019631 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24018460 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72691392 Regarding naming, I agree. The name has been a point of discussion for a month, how to get some consensus? Regarding Java wrappers, there have already been

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24017652 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24019904 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24019256 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/OffsetRange.scala --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24017456 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,249 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24019815 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24018579 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/OffsetRange.scala --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24020676 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24021621 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24020938 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaClusterSuite.scala --- @@ -0,0 +1,73 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24021039 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24030347 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24031550 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,249 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24033509 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24031208 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72758821 Besides introducing 2 classes where 1 would do, it implies that there are (or could be) multiple implementations of the abstract class. You're not using

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72760900 To put it another way, the type you return has to be public. If you return a public abstract class, what are you going to do when someone else subclasses

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72754442 Like patrick said, I really don't see any reason not to just expose KafkaRDD. You can still hide its constructor without making a superflous abstract class

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72757731 Just make the simplified createRDD return a static type of RDD[(K, V)], that's what I'm saying. You're already going to have to deal with those other type

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72773257 To be clear, I'm ok with any solution that gives me access to what I need, which in this case are offsets. What's coming across as me feeling strongly about

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72775631 Hey man, I'd rather talk about the code anyway. I think there's just something I'm missing as far as your underlying assumptions about interfaces go :) Thanks

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23974964 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DeterministicKafkaInputDStream.scala --- @@ -0,0 +1,150

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23972944 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala --- @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23972610 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DeterministicKafkaInputDStream.scala --- @@ -0,0 +1,150

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-01 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23904918 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,249 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-01 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23904616 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,249 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-31 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23889820 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-31 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23890348 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-31 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23890594 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23726952 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,116 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23737948 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,116 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23729727 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/OffsetRange.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23729934 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala --- @@ -130,7 +130,7 @@ abstract class

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23729871 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,116 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23746235 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/OffsetRange.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23746262 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,116 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23746302 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala --- @@ -130,7 +130,7 @@ abstract class

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23746360 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaCluster.scala --- @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71968759 packaging, makes sense method name, agreed, named it createNewStream for now offset range, see my explanation of the interface above. I think

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23748806 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/OffsetRange.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-27 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23616474 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -788,6 +788,20 @@ abstract class RDD[T: ClassTag

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-27 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71667244 Most of Either's problems can be fixed with a one-line implicit conversion to RightProjection. I've seen scalactic before, seems like overkill by comparison

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-26 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71549132 Just updated it On Mon, Jan 26, 2015 at 4:06 PM, Hari Shreedharan notificati...@github.com wrote: @koeninger https://github.com/koeninger Can you

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23576495 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaRDD.scala --- @@ -0,0 +1,199 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-26 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71564796 I'm not a big fan of either either :) The issue here is that KafkaCluster is potentially dealing with multiple exceptions due to multiple brokers

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-26 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71565526 I think as long as offsets are available for advanced users that want them, relying on checkpointing for the happy path should be ok. Will probably be some

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-22 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23418815 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DeterministicKafkaInputDStream.scala --- @@ -0,0 +1,123

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-22 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71122072 I need to know, perhaps even at the driver, what the ending offset is in order to be able to commit it. I also have several use cases where I want to end

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-22 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71120030 Yeah, it's pulled down every batch interval. That way you know exactly what the upper and lower bounds of the offsets are. On Thu, Jan 22, 2015 at 5:15 PM

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-22 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71131380 Point is, it's up to client code to commit, so that it can implement exactly-once semantics if necessary. Committing automatically at the end of compute would

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-13 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-69754083 1. Yes, I removed ivy and maven cache, verified the example app failed to locate the dependency, re-published from the spark dev version, verified the example

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-12 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-69651872 The classloader issue was when reading from the checkpoint. If we want to rely on subclassing, some of the implementation (e.g. currentOffsets

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-12 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-69656800 Yeah, this is on a local development version, after assembly / publish local. Here's a gist of the exception and the diff that causes it (using

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-09 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-69446001 I went ahead and implemented locality and checkpointing of generated rdds. Couple of points - still depends on SPARK-4014 eventually being merged

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22653565 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaCluster.scala --- @@ -0,0 +1,313 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22653442 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaCluster.scala --- @@ -0,0 +1,313 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22653905 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaRDD.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-05 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-68804547 I'm hopeful that SPARK-4014 will be finalized soon, waiting on that before doing the refactor for preferred locations. That will involve changing the partition

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-05 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22498572 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -788,6 +788,20 @@ abstract class RDD[T: ClassTag

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-01-04 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r22446683 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala --- @@ -789,7 +790,7 @@ class JavaPairDStream[K, V](val

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-04 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22446804 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DeterministicKafkaInputDStream.scala --- @@ -0,0 +1,123

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-30 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22362375 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaRDD.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-30 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22364279 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DeterministicKafkaInputDStream.scala --- @@ -0,0 +1,123

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-30 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22364358 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -788,6 +788,20 @@ abstract class RDD[T: ClassTag

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-30 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22366068 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DeterministicKafkaInputDStream.scala --- @@ -0,0 +1,123

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-30 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22366140 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaCluster.scala --- @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-30 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22366683 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaCluster.scala --- @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68405794 Thanks for this. Most of the uses of attemptId I've seen look like they were assuming it meant the 0-based attempt number. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68407388 The flip side is that it's already documented as doing the right thing: http://spark.apache.org/docs/1.1.1/api/scala/index.html#org.apache.spark.TaskContext

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-29 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22330325 --- Diff: external/kafka/pom.xml --- @@ -44,7 +44,7 @@ dependency groupIdorg.apache.kafka/groupId artifactIdkafka_

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-68307147 I got some good feedback from Koert Kuipers at Tresata regarding location awareness, so I'll be doing some refactoring to add that. --- If your project is set up

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-26 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-68149432 Hi @jerryshao I'd politely ask that anyone with questions read at least KafkaRDD.scala and the example usage linked from the jira ticket (it's only about 50

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-24 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/3798 [SPARK-4964] [Streaming] Exactly-once semantics for Kafka You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1 kafkaRdd

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-17 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-67337516 Jenkins is failing org.apache.spark.scheduler.SparkListenerSuite.local metrics org.apache.spark.streaming.flume.FlumeStreamSuite.flume input compressed

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r21610025 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: SparkContext

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r21638810 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: SparkContext

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r21192361 --- Diff: docs/configuration.md --- @@ -664,6 +665,24 @@ Apart from these, the following properties are also available, and may be useful /td

[GitHub] spark pull request: Closes SPARK-4229 Create hadoop configuration ...

2014-12-01 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/3543 Closes SPARK-4229 Create hadoop configuration in a consistent way You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1 SPARK

[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...

2014-12-01 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3102#issuecomment-65176731 Yes, the new hadoop config documentation is just documenting the behavior of SparkHadoopUtil.scala lines 95-100 Sorry about the branch situation, I was unclear

[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...

2014-12-01 Thread koeninger
Github user koeninger closed the pull request at: https://github.com/apache/spark/pull/3102 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...

2014-11-04 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/3102 Spark 4229 Create hadoop configuration in a consistent way You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1 SPARK-4229

[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-11 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/2345#issuecomment-55336261 @marbrus I see what you mean. Updated to basically what you suggested, aside from building the map once. Let me know, once it's finalized I can try to test one more

[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-10 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/2345 SPARK-3462 push down filters and projections into Unions You can merge this pull request into a Git repository by running: $ git pull https://github.com/mediacrossinginc/spark SPARK-3462

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r17279539 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JdbcResultSetRDDSuite.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r17279806 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -81,8 +113,14 @@ class JdbcRDD[T: ClassTag]( logInfo(statement fetch size

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r17280404 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -67,6 +73,32 @@ class JdbcRDD[T: ClassTag]( }).toArray

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r17280425 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -67,6 +73,32 @@ class JdbcRDD[T: ClassTag]( }).toArray

[GitHub] spark pull request: SPARK-1097: Do not introduce deadlock while fi...

2014-07-15 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/1409#issuecomment-49066268 Testing that patch, it seems to have fixed the deadlock we were seeing in production. --- If your project is set up for it, you can reply to this email and have your

<    2   3   4   5   6   7