Hi, I have a very, very simple streaming job. When I deploy this on the exact same cluster, with the exact same parameters, I see big (40%) performance difference between "client" and "cluster" deployment mode. This seems a bit surprising.. Is this expected?
The streaming job is: val msgStream = kafkaStream .map { case (k, v) => v} .map(DatatypeConverter.printBase64Binary) .foreachRDD(save) .saveAsTextFile("s3n://some.bucket/path", classOf[LzoCodec]) I tried several times, but the job deployed with "client" mode can only write at 60% throughput of the job deployed with "cluster" mode and this happens consistently. I'm logging at INFO level, but my application code doesn't log anything so it's only Spark logs. The logs I see in "client" mode doesn't seem like a crazy amount. The setup is: spark-ec2 [...] \ --copy-aws-credentials \ --instance-type=m3.2xlarge \ -s 2 launch test_cluster And all the deployment was done from the master machine. ᐧ