[ 
https://issues.apache.org/jira/browse/HDFS-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567127#comment-14567127
 ] 

Steve Loughran commented on HDFS-8416:
--------------------------------------

This is is a silly question, but why use HDFS here? If it's a shared FS (e.g. 
NFS mount), why not just mount it and use it direct? All HDFS would appear to 
be doing here is adding complexity and another failure point. Or is this some 
kind of hybrid system where the writes are always done by one DN, but clients 
from other systems can read direct via the mount points.

Security wise, the main issue would be that all blocks in a DN belong to a DN, 
user identity is not checked at the DN, merely whether or not the caller has a 
valid token. For shared FS, you'd really need the underlying FS to implement 
permissions, kerberos auth, etc, etc.

Ignoring that detail, I can see why you would get performance speedup, 
especially with NFS caching accelerating reads. And it's a very good writeup.

> Short circuit remote reads from shared storage
> ----------------------------------------------
>
>                 Key: HDFS-8416
>                 URL: https://issues.apache.org/jira/browse/HDFS-8416
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, hdfs-client, nfs, performance
>            Reporter: Ahmed Mahran
>
> In a Hadoop cluster configuration that employs a shared storage system, HDFS 
> read and write operations are very expensive in terms of network bandwidth 
> consumption.
> For a DFS client to read a block from a remote datanode, the block is 
> transmitted first from the shared storage to the datanode then from the 
> datanode to the DFS client. Short circuiting the shared storage to datanode 
> hop and allowing the client to directly access the shared storage would 
> improve the performance substantially.
> This blog post describes the issue and provides a hack for the remote read.
> http://www.badrit.com/blog/2015/3/20/hdfs-short-circuit-shared-storage-remote-read-hacking-the-hdfs-short-circuit-local-read-for-short-circuiting-remote-reads-from-a-shared-storage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to