Apache Spark commented on SPARK-21113:

User 'sitalkedia' has created a pull request for this issue:

> Support for read ahead input stream to amortize disk IO cost in the Spill 
> reader
> --------------------------------------------------------------------------------
>                 Key: SPARK-21113
>                 URL: https://issues.apache.org/jira/browse/SPARK-21113
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.0.2
>            Reporter: Sital Kedia
>            Assignee: Sital Kedia
>            Priority: Minor
>             Fix For: 2.3.0
> Profiling some of our big jobs, we see that around 30% of the time is being 
> spent in reading the spill files from disk. In order to amortize the disk IO 
> cost, the idea is to implement a read ahead input stream which which 
> asynchronously reads ahead from the underlying input stream when specified 
> amount of data has been read from the current buffer. It does it by 
> maintaining two buffer - active buffer and read ahead buffer. Active buffer 
> contains data which should be returned when a read() call is issued. The read 
> ahead buffer is used to asynchronously read from the underlying input stream 
> and once the current active buffer is exhausted, we flip the two buffers so 
> that we can start reading from the read ahead buffer without being blocked in 
> disk I/O.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to