>
>
> Hi all,
I am trying to push my sprak processed data to 3 node cluster of C*.
I am pushing 200 million records to cassandra it is taking 2 hours....

Below it the my spark cluster configuration
Nodes : 12
vCores Total : 112
Total memory : 1.5 TB.


spark.cassandra.connection.port 9042
spark.cassandra.output.batch.grouping.buffer.size 1000
spark.cassandra.output.batch.size.bytes 2056
spark.cassandra.output.concurrent.writes 1500
spark.cassandra.output.throughput_mb_per_sec 5
spark.driver.cores 1
spark.driver.extraClassPath .
spark.driver.extraJavaOptions Dlog4j.debug
spark.driver.host nj11mhf0068
spark.driver.memory 2g
spark.driver.memoryOverhead 512
spark.driver.port 54303
spark.dynamicAllocation.enabled TRUE
spark.dynamicAllocation.executorIdleTimeout 180s
spark.dynamicAllocation.maxExecutors 21
spark.dynamicAllocation.minExecutors 20
spark.executor.cores 2
spark.executor.extraJavaOptions Dlog4j.configuration=log4j.properties
spark.executor.id driver
spark.executor.instances 4
spark.executor.memory 4g
spark.executor.memoryOverhead 1024

[image: image.png]


I have made the spark data frame partitions 10 as below
   val df = df_raw.repartition(numOfPartitions)   // numOfPartitions = 10

But still my application is very slow.
Can you please help me what am I doing wrong here ?


Regards,
Shyam

Reply via email to