When Datanode was initially designed, Linux AIO was still early in its
adoption. Kernel support was there and the libraries were almost there. No
java support, of course.  We would have to write a lot of native code for
it and use JNI. Also, AIO means bypassing kernel page cache since you are
doing it with O_DIRECT. We would have to implement some sort of block data
caching on our own.

Another option was to build an async framework in datanode. Instead, the
community chose to use a pool of data transceiver threads to move forward
fast.  There are some discussions and efforts to improve this, as the
workload has changed since the early days.  However, the current way still
utilizes io schedulers on block devices, so you will see a lot of io
combining happening for typical loads.  These are not direct I/O, so
read-ahead do happen and page cache is utilized.

Kihwal



On Wed, Mar 11, 2020 at 11:18 AM Wei-Chiu Chuang <weic...@apache.org> wrote:

> Hi David,
> We talked a bit about a similar topic on DataNode sockets a while back. Any
> feedback on the DataNode disk access?
>
> On Thu, Mar 5, 2020 at 4:16 PM Mania Abdi <abdi...@husky.neu.edu> wrote:
>
> > Hello everyone
> >
> > I have a question regarding HDFS, data node code version 2.7.2. I have
> > posted my question as Jira issue
> > <https://issues.apache.org/jira/projects/HDFS/issues/HDFS-15206>.
> >
> > I have observed that datanode issues sequential synchronous 64KB reads to
> > local disk and add then send it to user and wait for the acknowledgement
> > from the user. I was wondering why HDFS community did not use file
> mapping
> > or asynchronous read from disk? This could allow disk scheduler to
> perform
> > sequential reads from disk or perform read-ahead and prefetching. Is it
> > something that could lead to performance improvement or not.
> >
> > I would appreciate if you could help me to find the answer to this issue
> > from Hadoop community
> > perspective.
> >
> > I asked from apache members and they told me that the version I am
> pointing
> > to is old and this part of code written from scratch for modern SSDs.
> Could
> > you please help me to find at which version this modification happened?
> and
> > Where I can find it.
> >
> > Many thanks
> > Mania
> >
>

Reply via email to