[
https://issues.apache.org/jira/browse/HDFS-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547266#comment-14547266
]
Ahmed Mahran commented on HDFS-8416:
------------------------------------
A high level implementation suggestion:
- Add an optional modifier (+SHARED) to the storage type when configuring the
datanode's data dir (dfs.datanode.data.dir). This will inform the configured
datanode that the provided path is shared and could be accessed by DFS clients
not necessarily running co-local with the datanode. For example,
dfs.datanode.data.dir could be set to
\[DISK+SHARED\]/home/host/hdfs1-1,\[ARCHIVE+SHARED\]/home/host/hdfs1-2
- If the DFS client is co-local with the datanode, given the site
configurations for short-circuit reads, it proceeds with the current read
protocol otherwise it should express its willingness to read the block directly.
- When the datanode receives this read request with the client willingness to
read the file directly, it checks whether the data dir holding the block is
shared or not (has the +SHARED modifier or not). If the dir is shared, the
datanode replies with the direct block path otherwise the usual protocol takes
over.
- The data dirs for a datanode could be mounted in a read only mode on other
machines not hosting the datanode as a security precaution.
> Short circuit remote reads from shared storage
> ----------------------------------------------
>
> Key: HDFS-8416
> URL: https://issues.apache.org/jira/browse/HDFS-8416
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode, hdfs-client, nfs, performance
> Reporter: Ahmed Mahran
>
> In a Hadoop cluster configuration that employs a shared storage system, HDFS
> read and write operations are very expensive in terms of network bandwidth
> consumption.
> For a DFS client to read a block from a remote datanode, the block is
> transmitted first from the shared storage to the datanode then from the
> datanode to the DFS client. Short circuiting the shared storage to datanode
> hop and allowing the client to directly access the shared storage would
> improve the performance substantially.
> This blog post describes the issue and provides a hack for the remote read.
> http://www.badrit.com/blog/2015/3/20/hdfs-short-circuit-shared-storage-remote-read-hacking-the-hdfs-short-circuit-local-read-for-short-circuiting-remote-reads-from-a-shared-storage
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)