Hey Cody, I would have responded to the mailing list but it looks like this
thread got aged off. I have the problem where one of my topics dumps more
data than my spark job can keep up with. We limit the input rate with
maxRatePerPartition Eventually, when the data is aged off, I get the
he
> offset ranges. See the definition of count() in recent versions of
> KafkaRDD for an example.
>
> On Wed, Dec 9, 2015 at 9:39 AM, Dan Dutrow <dan.dut...@gmail.com> wrote:
>
>> Is there documentation for how to update the metrics (#messages per
>> batch) i
Is there documentation for how to update the metrics (#messages per batch)
in the Spark Streaming tab when using the Direct API? Does the Streaming
tab get its information from Zookeeper or something else internally?
--
Dan ✆
Hey Cody, I'm convinced that I'm not going to get the functionality I want
without using the Direct Stream API.
I'm now looking through
https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md#exactly-once-using-transactional-writes
where you say "For the very first time the job is
Sigh... I want to use the direct stream and have recently brought in Redis
to persist the offsets, but I really like and need to have realtime metrics
on the GUI, so I'm hoping to have Direct and Receiver stream both working.
On Wed, Dec 2, 2015 at 3:17 PM Cody Koeninger
set is only useful for determining whether to start the
> stream at the beginning or the end of the log... if someone's provided
> explicit offsets, it's pretty clear where to start the stream.
>
> On Tue, Sep 8, 2015 at 1:19 PM, Dan Dutrow <dan.dut...@gmail.com> wrote:
>
>> Y
The two methods of createDirectStream appear to have different
implementations, the second checks the offset.reset flags and does some
error handling while the first does not. Besides the use of a
messageHandler, are they intended to be used in different situations?
def createDirectStream[ K:
;
> On Tue, Sep 8, 2015 at 11:44 AM, Dan Dutrow <dan.dut...@gmail.com> wrote:
>
>> The two methods of createDirectStream appear to have different
>> implementations, the second checks the offset.reset flags and does some
>> error handling while the first doe
Thanks. Looking at the KafkaCluster.scala code, (
https://github.com/apache/spark/blob/master/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala#L253),
it seems a little hacky for me to alter and recompile spark to expose those
methods, so I'll use the receiver API