[ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955426#comment-16955426
 ] 

Xiaoqiao He commented on HDFS-14283:
------------------------------------

Thanks [~leosun08] for your works.
{quote}But i have a problem that current block.getLocations() which gets a list 
of DataNodes in priority order does not consider choosed DN LOAD, bandwidth 
etc. I think it is necessary to add this logic later.{quote}
HDFS-14882 is working now, it is very pleasure if you are interested to review 
it?
For this ticket, I am concerned about which one should be given priority 
between distance and cached. Or leave the option to user?
Consider the following case, 3 replicas (names ra, rb, rc respectively) of one 
block, and set cache replicas number to 2 which combine with rb and rc. then 
another client which topology distance is more near to host which ra located at 
(one corner case is that the client from the same host which ra located at) 
rather than hosts rb/rc located. Then which one host should the client request 
to firstly. I believe both ra or rb/rc is reasonable. [^HDFS-14283.003.patch] 
seems to choose cache priority policy, right? I just suggest maybe it is better 
to leave the choice to user.

> DFSInputStream to prefer cached replica
> ---------------------------------------
>
>                 Key: HDFS-14283
>                 URL: https://issues.apache.org/jira/browse/HDFS-14283
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.6.0
>         Environment: HDFS Caching
>            Reporter: Wei-Chiu Chuang
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-14283.001.patch, HDFS-14283.002.patch, 
> HDFS-14283.003.patch
>
>
> HDFS Caching offers performance benefits. However, currently NameNode does 
> not treat cached replica with higher priority, so HDFS caching is only useful 
> when cache replication = 3, that is to say, all replicas are cached in 
> memory, so that a client doesn't randomly pick an uncached replica.
> HDFS-6846 proposed to let NameNode give higher priority to cached replica. 
> Changing a logic in NameNode is always tricky so that didn't get much 
> traction. Here I propose a different approach: let client (DFSInputStream) 
> prefer cached replica.
> A {{LocatedBlock}} object already contains cached replica location so a 
> client has the needed information. I think we can change 
> {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to