[ https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955426#comment-16955426 ]
Xiaoqiao He commented on HDFS-14283: ------------------------------------ Thanks [~leosun08] for your works. {quote}But i have a problem that current block.getLocations() which gets a list of DataNodes in priority order does not consider choosed DN LOAD, bandwidth etc. I think it is necessary to add this logic later.{quote} HDFS-14882 is working now, it is very pleasure if you are interested to review it? For this ticket, I am concerned about which one should be given priority between distance and cached. Or leave the option to user? Consider the following case, 3 replicas (names ra, rb, rc respectively) of one block, and set cache replicas number to 2 which combine with rb and rc. then another client which topology distance is more near to host which ra located at (one corner case is that the client from the same host which ra located at) rather than hosts rb/rc located. Then which one host should the client request to firstly. I believe both ra or rb/rc is reasonable. [^HDFS-14283.003.patch] seems to choose cache priority policy, right? I just suggest maybe it is better to leave the choice to user. > DFSInputStream to prefer cached replica > --------------------------------------- > > Key: HDFS-14283 > URL: https://issues.apache.org/jira/browse/HDFS-14283 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 2.6.0 > Environment: HDFS Caching > Reporter: Wei-Chiu Chuang > Assignee: Lisheng Sun > Priority: Major > Attachments: HDFS-14283.001.patch, HDFS-14283.002.patch, > HDFS-14283.003.patch > > > HDFS Caching offers performance benefits. However, currently NameNode does > not treat cached replica with higher priority, so HDFS caching is only useful > when cache replication = 3, that is to say, all replicas are cached in > memory, so that a client doesn't randomly pick an uncached replica. > HDFS-6846 proposed to let NameNode give higher priority to cached replica. > Changing a logic in NameNode is always tricky so that didn't get much > traction. Here I propose a different approach: let client (DFSInputStream) > prefer cached replica. > A {{LocatedBlock}} object already contains cached replica location so a > client has the needed information. I think we can change > {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org