Mania Abdi created HDFS-15206:
---------------------------------
Summary: HDFS synchronous reads from local file system
Key: HDFS-15206
URL: https://issues.apache.org/jira/browse/HDFS-15206
Project: Hadoop HDFS
Issue Type: Improvement
Environment: !Screenshot from 2020-03-03 22-07-26.png!
Reporter: Mania Abdi
Attachments: Screenshot from 2020-03-03 22-07-26.png
Hello everyone,
I ran a simple benchmark with runs ``` hadoop fs -get /file1.txt ```, and
file1.txt has 1MB size and I capture the workflow of requests using XTrace. By
evaluating the workflow trace, I noticed that datanode issues 64KB synchronous
read request to local file system to read the data, and sends the data back and
waits for completion. I had a code walk over HDFS code to verify the pattern
and it was correct. I want to have two suggestions, (1) since each file in HDFS
block size is usually 128MB, We could use the mmap mapping via FileChannel
class to load the file into memory and enable file system prefetching and look
ahead in background, instead of synchronously reading from disk. The second
suggestion is to use asynchronous read operations to local disk of the
datanode. I was wondering if there is a logic behind synchronous reads from the
file system?
Code:
XTrace: [http://brownsys.github.io/tracing-framework/xtrace/server/]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]