Kay Ousterhout created SPARK-5920:
-------------------------------------
Summary: Use a BufferedInputStream to read local shuffle data
Key: SPARK-5920
URL: https://issues.apache.org/jira/browse/SPARK-5920
Project: Spark
Issue Type: Improvement
Components: Shuffle
Affects Versions: 1.2.1, 1.3.0
Reporter: Kay Ousterhout
Assignee: Kay Ousterhout
When reading local shuffle data, Spark doesn't currently buffer the local reads
into larger chunks, which can lead to terrible disk performance if many tasks
are concurrently reading local data from the same disk. We should use a
BufferedInputStream to mitigate this problem; we can lazily create the input
stream to avoid allocating a bunch of in-memory buffers at the same time for
tasks that read shuffle data from a large number of local blocks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]