Re: Problems w/YARN Spark Streaming app reading from Kafka

Robert Towne Mon, 14 Dec 2015 16:00:51 -0800

Cody, sorry I didn’t get back sooner, I never saw the response pass by.

I was looking at the spark ui.   I’ll see if I can recreate the issue w/version 
1.5.2.  Thanks..

From: Cody Koeninger <c...@koeninger.org<mailto:c...@koeninger.org>>
Date: Friday, October 16, 2015 at 12:48
To: robert towne <robert.to...@webtrends.com<mailto:robert.to...@webtrends.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: Problems w/YARN Spark Streaming app reading from Kafka

What do you mean by "the current documentation states it isn’t used"?  
http://spark.apache.org/docs/latest/configuration.html  still lists the value 
and its meaning.

As far as the issue you're seeing, are you measuring records by looking at 
logs, the spark ui, or actual downstream sinks of data?  I don't think the 
backpressure code does any logging, but KafkaRDD will log at info level the 
offsets for each topicpartition that is computed (message starts with 
"Computing topic")

On Fri, Oct 16, 2015 at 1:52 PM, Robert Towne 
<robert.to...@webtrends.com<mailto:robert.to...@webtrends.com>> wrote:
I have a Spark Streaming app that reads using a reciever-less connection ( 
KafkaUtils.createDirectStream) with an interval of 1 minute.
For about 15 hours it was running fine, ranging in input size of 3,861,758 to 
16,836 events.

Then about 3 hours ago, every minute batch brought in the same number of 
records = 5,760 (2 topics, topic 1 = 64 partitions, topic 2 = 32 partitions).

I know there is more data than the 5,760 records that being piped in, and 
eventually we’ll fall so far behind that our kafka offsets will not be 
available.
It seems odd that 5760/96 (partitions) = 60 – or my interval in seconds.

I do have spark.streaming.backpressure.enabled = true and even though the 
current documentation states it isn’t used I have a value set for 
spark.streaming.kafka.maxRatePerPartition.

Has anyone else seen this issue where the rate seems capped even though it 
should be pulling more data?

Thanks,
Robert

Re: Problems w/YARN Spark Streaming app reading from Kafka

Reply via email to