[ 
https://issues.apache.org/jira/browse/HDFS-15692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

István Fajth updated HDFS-15692:
--------------------------------
    Summary: Improve fuse_dfs read performace  (was: Improve furse_dfs read 
performace)

> Improve fuse_dfs read performace
> --------------------------------
>
>                 Key: HDFS-15692
>                 URL: https://issues.apache.org/jira/browse/HDFS-15692
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: fuse-dfs
>            Reporter: István Fajth
>            Priority: Major
>
> Currently fuse_dfs uses a prefetch buffer to read from HDFS via libhdfs' 
> pread method.
> The algorithm inside fuse_read.c in short does the following:
>  if the rdbuffer size is less then the buffer provided
>  then
>   reads directly to the buffer
>  else
>   grab lock
>     if the preftch buffer does not have more data
>     then
>       fills the prefetch buffer
>     endif
>     fills the supplied buffer via memcpy from the prefetch buffer
>   release lock
> endif
> It would be nice to have a background thread and double prefetch buffers, so 
> while one buffer serves the reads coming from the local client, the other can 
> prefetch the data, with that we can improve the read speed, especially with 
> EC encoded files.
> According to some measurements I did, if I increase the read buffer, there is 
> a significant change in runtime, with 64MB the runtime is really closer to 
> HDFS by a large margin. Interestingly 128MB as the buffer size does not 
> perform well, but 256MB is even more closer to what the dfs client can 
> provide. (16 vs 18 seconds with rep3 files, while in par with ec encoded 
> files dfs vs fuse)
> So it seems it is worth to stream continuously a larger chunk of data, at 
> least with pread, but in case we have a separate fetching thread and double 
> buffering, we don't even need positioned reads, simply just continuous 
> streaming of data with read.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to