Kay Ousterhout created SPARK-5920: ------------------------------------- Summary: Use a BufferedInputStream to read local shuffle data Key: SPARK-5920 URL: https://issues.apache.org/jira/browse/SPARK-5920 Project: Spark Issue Type: Improvement Components: Shuffle Affects Versions: 1.2.1, 1.3.0 Reporter: Kay Ousterhout Assignee: Kay Ousterhout
When reading local shuffle data, Spark doesn't currently buffer the local reads into larger chunks, which can lead to terrible disk performance if many tasks are concurrently reading local data from the same disk. We should use a BufferedInputStream to mitigate this problem; we can lazily create the input stream to avoid allocating a bunch of in-memory buffers at the same time for tasks that read shuffle data from a large number of local blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org