[ https://issues.apache.org/jira/browse/SPARK-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Wendell updated SPARK-5920: ----------------------------------- Priority: Critical (was: Major) > Use a BufferedInputStream to read local shuffle data > ---------------------------------------------------- > > Key: SPARK-5920 > URL: https://issues.apache.org/jira/browse/SPARK-5920 > Project: Spark > Issue Type: Improvement > Components: Shuffle > Affects Versions: 1.3.0, 1.2.1 > Reporter: Kay Ousterhout > Assignee: Kay Ousterhout > Priority: Critical > > When reading local shuffle data, Spark doesn't currently buffer the local > reads into larger chunks, which can lead to terrible disk performance if many > tasks are concurrently reading local data from the same disk. We should use > a BufferedInputStream to mitigate this problem; we can lazily create the > input stream to avoid allocating a bunch of in-memory buffers at the same > time for tasks that read shuffle data from a large number of local blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org