Recovery for Spark Streaming Kafka Direct with OffsetOutOfRangeException

2016-01-21 Thread Dan Dutrow
Hey Cody, I would have responded to the mailing list but it looks like this thread got aged off. I have the problem where one of my topics dumps more data than my spark job can keep up with. We limit the input rate with maxRatePerPartition Eventually, when the data is aged off, I get the

Re: Spark Stream Monitoring with Kafka Direct API

2015-12-09 Thread Dan Dutrow
he > offset ranges. See the definition of count() in recent versions of > KafkaRDD for an example. > > On Wed, Dec 9, 2015 at 9:39 AM, Dan Dutrow <dan.dut...@gmail.com> wrote: > >> Is there documentation for how to update the metrics (#messages per >> batch) i

Spark Stream Monitoring with Kafka Direct API

2015-12-09 Thread Dan Dutrow
Is there documentation for how to update the metrics (#messages per batch) in the Spark Streaming tab when using the Direct API? Does the Streaming tab get its information from Zookeeper or something else internally? -- Dan ✆

Re: Kafka - streaming from multiple topics

2015-12-03 Thread Dan Dutrow
Hey Cody, I'm convinced that I'm not going to get the functionality I want without using the Direct Stream API. I'm now looking through https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md#exactly-once-using-transactional-writes where you say "For the very first time the job is

Re: Kafka - streaming from multiple topics

2015-12-02 Thread Dan Dutrow
Sigh... I want to use the direct stream and have recently brought in Redis to persist the offsets, but I really like and need to have realtime metrics on the GUI, so I'm hoping to have Direct and Receiver stream both working. On Wed, Dec 2, 2015 at 3:17 PM Cody Koeninger

Re: Different Kafka createDirectStream implementations

2015-09-08 Thread Dan Dutrow
set is only useful for determining whether to start the > stream at the beginning or the end of the log... if someone's provided > explicit offsets, it's pretty clear where to start the stream. > > On Tue, Sep 8, 2015 at 1:19 PM, Dan Dutrow <dan.dut...@gmail.com> wrote: > >> Y

Different Kafka createDirectStream implementations

2015-09-08 Thread Dan Dutrow
The two methods of createDirectStream appear to have different implementations, the second checks the offset.reset flags and does some error handling while the first does not. Besides the use of a messageHandler, are they intended to be used in different situations? def createDirectStream[ K:

Re: Different Kafka createDirectStream implementations

2015-09-08 Thread Dan Dutrow
; > On Tue, Sep 8, 2015 at 11:44 AM, Dan Dutrow <dan.dut...@gmail.com> wrote: > >> The two methods of createDirectStream appear to have different >> implementations, the second checks the offset.reset flags and does some >> error handling while the first doe

Re: Maintaining Kafka Direct API Offsets

2015-08-14 Thread Dan Dutrow
Thanks. Looking at the KafkaCluster.scala code, ( https://github.com/apache/spark/blob/master/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala#L253), it seems a little hacky for me to alter and recompile spark to expose those methods, so I'll use the receiver API