Kay Ousterhout created SPARK-5920:
-------------------------------------

             Summary: Use a BufferedInputStream to read local shuffle data
                 Key: SPARK-5920
                 URL: https://issues.apache.org/jira/browse/SPARK-5920
             Project: Spark
          Issue Type: Improvement
          Components: Shuffle
    Affects Versions: 1.2.1, 1.3.0
            Reporter: Kay Ousterhout
            Assignee: Kay Ousterhout


When reading local shuffle data, Spark doesn't currently buffer the local reads 
into larger chunks, which can lead to terrible disk performance if many tasks 
are concurrently reading local data from the same disk.  We should use a 
BufferedInputStream to mitigate this problem; we can lazily create the input 
stream to avoid allocating a bunch of in-memory buffers at the same time for 
tasks that read shuffle data from a large number of local blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to