Hi All, I am struggling with an odd issue and would like your help in addressing it.
Environment AWS Cluster (40 Spark Nodes & 4 node Kafka cluster) Spark Kafka Streaming submitted in Yarn cluster mode Kafka - Single topic, 400 partitions Spark 2.1 on Cloudera Kafka 10.0 on Cloudera We have zero messages in Kafka and starting this spark job with 100 Executors each with 14GB of RAM and single executor core. The time to process 0 records(end of each batch) is 5seconds When we increase the executors to 400 and everything else remains the same except we reduce memory to 11GB, we see the time to process 0 records(end of each batch) increases 10times to 50Second and some cases it goes to 103 seconds. Spark Streaming configs that we are setting are Batchwindow = 60 seconds Backpressure.enabled = true spark.memory.fraction=0.3 (we store more data in our own data structures) spark.streaming.kafka.consumer.poll.ms=10000 Have tried increasing driver memory to 4GB and also increased driver.cores to 4. If anybody has faced similar issues please provide some pointers to how to address this issue. Thanks a lot for your time. Regards, Edwin