Hi all, In my occasion, I have a huge HashMap[(Int, Long), (Double, Double, Double)], say several GB to tens of GB, after each iteration, I need to collect() this HashMap and perform some calculation, and then broadcast() it to every node. Now I have 20GB for each executor and after it performances collect(), it gets stuck at "Added rdd_xx_xx", no further respond showed on the Application UI.
I've tried to lower the spark.shuffle.memoryFraction and spark.storage.memoryFraction, but it seems that it can only deal with as much as 2GB HashMap. What should I optimize for such conditions. (ps: sorry for my bad English & Grammar) Thanks