Is there guidance for configuring Flink on large clusters? I have recently
been working to benchmark some algorithms on and test AWS. I had no issues
running on a 16 node cluster but when moving to 64 nodes the JobManager
struggled mightily. It did not look to be parallelizing its workload. I was
in the process of modifying my code to reduce the parallelism of earlier,
smaller operations when I lost the cluster due to a spot price increase.

The instances were c3.8xlarge and in the larger cluster one instance hosted
the JobManager so the parallelism was 63 * 32 = 2016. The small cluster had
parallelism of 512.

I have seen the blog posts describing the performance of 640 core clusters
on GCE. Is this a known limitation or can Flink scale much further?

http://data-artisans.com/computing-recommendations-at-extreme-scale-with-apache-flink/

http://data-artisans.com/how-to-factorize-a-700-gb-matrix-with-apache-flink/

Thanks,
Greg

Reply via email to