> > > Hi all, I am trying to push my sprak processed data to 3 node cluster of C*. I am pushing 200 million records to cassandra it is taking 2 hours....
Below it the my spark cluster configuration Nodes : 12 vCores Total : 112 Total memory : 1.5 TB. spark.cassandra.connection.port 9042 spark.cassandra.output.batch.grouping.buffer.size 1000 spark.cassandra.output.batch.size.bytes 2056 spark.cassandra.output.concurrent.writes 1500 spark.cassandra.output.throughput_mb_per_sec 5 spark.driver.cores 1 spark.driver.extraClassPath . spark.driver.extraJavaOptions Dlog4j.debug spark.driver.host nj11mhf0068 spark.driver.memory 2g spark.driver.memoryOverhead 512 spark.driver.port 54303 spark.dynamicAllocation.enabled TRUE spark.dynamicAllocation.executorIdleTimeout 180s spark.dynamicAllocation.maxExecutors 21 spark.dynamicAllocation.minExecutors 20 spark.executor.cores 2 spark.executor.extraJavaOptions Dlog4j.configuration=log4j.properties spark.executor.id driver spark.executor.instances 4 spark.executor.memory 4g spark.executor.memoryOverhead 1024 [image: image.png] I have made the spark data frame partitions 10 as below val df = df_raw.repartition(numOfPartitions) // numOfPartitions = 10 But still my application is very slow. Can you please help me what am I doing wrong here ? Regards, Shyam