[jira] [Created] (HDFS-15692) Improve furse_dfs read performace

Jira Mon, 23 Nov 2020 19:54:36 -0800

István Fajth created HDFS-15692:
-----------------------------------

             Summary: Improve furse_dfs read performace
                 Key: HDFS-15692
                 URL: https://issues.apache.org/jira/browse/HDFS-15692
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: fuse-dfs
            Reporter: István Fajth



Currently fuse_dfs uses a prefetch buffer to read from HDFS via libhdfs' pread 
method.

The algorithm inside fuse_read.c in short does the following:
 if the rdbuffer size is less then the buffer provided
 then
  reads directly to the buffer
 else
  grab lock
    if the preftch buffer does not have more data
    then
      fills the prefetch buffer
    endif
    fills the supplied buffer via memcpy from the prefetch buffer
  release lock
endif

It would be nice to have a background thread and double prefetch buffers, so 
while one buffer serves the reads coming from the local client, the other can 
prefetch the data, with that we can improve the read speed, especially with EC 
encoded files.

According to some measurements I did, if I increase the read buffer, there is a 
significant change in runtime, with 64MB the runtime is really closer to HDFS 
by a large margin. Interestingly 128MB as the buffer size does not perform 
well, but 256MB is even more closer to what the dfs client can provide. (16 vs 
18 seconds with rep3 files, while in par with ec encoded files dfs vs fuse)

So it seems it is worth to stream continuously a larger chunk of data, at least 
with pread, but in case we have a separate fetching thread and double 
buffering, we don't even need positioned reads, simply just continuous 
streaming of data with read.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (HDFS-15692) Improve furse_dfs read performace

Reply via email to