Hi Edwin,

I have faced a similar issue as well and this behaviour is very abrupt. I
even created a question on StackOverflow but no solution yet.
https://stackoverflow.com/questions/43496205/spark-job-processing-time-increases-to-4s-without-explanation

For us, we sometimes had this constant delay of 4s (which increases to 8s
if we increase executors) whenever we started the job. But then we observed
something which you can see in the question above. The processing time
increases abruptly.

I read a lot about similar issues but always it was recommended that
something else is causing this delay. Although I am not really sure it
feels its some issue with kafka - spark integration but can't say for sure.

Regards,
Biplob

Thanks & Regards
Biplob Biswas

On Tue, Jun 20, 2017 at 5:42 AM, Mal Edwin <mal.ed...@vinadionline.com>
wrote:

> Hi All,
>
> I am struggling with an odd issue and would like your help in addressing
> it.
>
>
> *Environment*
>
> AWS Cluster (40 Spark Nodes & 4 node Kafka cluster)
>
> Spark Kafka Streaming submitted in Yarn cluster mode
>
> Kafka - Single topic, 400 partitions
>
> Spark 2.1 on Cloudera
>
> Kafka 10.0 on Cloudera
>
>
> We have zero messages in Kafka and starting this spark job with 100
> Executors each with 14GB of RAM and single executor core.
>
> The time to process 0 records(end of each batch) is 5seconds
>
>
> When we increase the executors to 400 and everything else remains the same
> except we reduce memory to 11GB, we see the time to process 0 records(end
> of each batch) increases 10times to  50Second and some cases it goes to 103
> seconds.
>
>
> Spark Streaming configs that we are setting are
>
> Batchwindow = 60 seconds
>
> Backpressure.enabled = true
>
> spark.memory.fraction=0.3 (we store more data in our own data structures)
>
> spark.streaming.kafka.consumer.poll.ms=10000
>
>
> Have tried increasing driver memory to 4GB and also increased driver.cores
> to 4.
>
>
> If anybody has faced similar issues please provide some pointers to how to
> address this issue.
>
>
> Thanks a lot for your time.
>
>
> Regards,
>
> Edwin
>
>

Reply via email to