Dear Spark developers: We are benchmarking spark operations such as filter, group, join on ssd instance i2.2xlarge on EC2. Most operations are similar or slightly better than ephemeral disks on EC2, however, the performance of group operation on SDD are much worse than regular disks, at least 2x to 3x worse. Could any of you shed some lights on this behavior?
Thanks a lot, -chen
