[
https://issues.apache.org/jira/browse/HDFS-15692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
István Fajth updated HDFS-15692:
--------------------------------
Summary: Improve fuse_dfs read performace (was: Improve furse_dfs read
performace)
> Improve fuse_dfs read performace
> --------------------------------
>
> Key: HDFS-15692
> URL: https://issues.apache.org/jira/browse/HDFS-15692
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: fuse-dfs
> Reporter: István Fajth
> Priority: Major
>
> Currently fuse_dfs uses a prefetch buffer to read from HDFS via libhdfs'
> pread method.
> The algorithm inside fuse_read.c in short does the following:
> if the rdbuffer size is less then the buffer provided
> then
> reads directly to the buffer
> else
> grab lock
> if the preftch buffer does not have more data
> then
> fills the prefetch buffer
> endif
> fills the supplied buffer via memcpy from the prefetch buffer
> release lock
> endif
> It would be nice to have a background thread and double prefetch buffers, so
> while one buffer serves the reads coming from the local client, the other can
> prefetch the data, with that we can improve the read speed, especially with
> EC encoded files.
> According to some measurements I did, if I increase the read buffer, there is
> a significant change in runtime, with 64MB the runtime is really closer to
> HDFS by a large margin. Interestingly 128MB as the buffer size does not
> perform well, but 256MB is even more closer to what the dfs client can
> provide. (16 vs 18 seconds with rep3 files, while in par with ec encoded
> files dfs vs fuse)
> So it seems it is worth to stream continuously a larger chunk of data, at
> least with pread, but in case we have a separate fetching thread and double
> buffering, we don't even need positioned reads, simply just continuous
> streaming of data with read.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]