[jira] [Commented] (HDFS-14283) DFSInputStream to prefer cached replica

Siyao Meng (Jira) Tue, 08 Oct 2019 16:06:00 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947235#comment-16947235
 ]


Siyao Meng commented on HDFS-14283:
-----------------------------------

[~leosun08] After some asking and digging, it seems HDFS block read sorting 
doesn't consider DN load, at least for now (but block write does, see config 
dfs.namenode.redundancy.considerLoad, which is used in block placement policy).
This makes sense since a block read would typically be much quicker than write 
(especially from memory in our case), the most important factor here should 
just be distance.

I'm fine with the approach in patch rev 002 that only sorts the cached block 
locations by distance order.

One final thing, would you add some sort of testing for this in 
TestDatanodeManager (or any other test class / a new test class as you see fit)?
We want to make sure the logic works as intended:
(1) when the block is cached on one or more DataNodes, it should return the 
location of the nearest DataNode that has the cached block.
(2) the block *isn't cached* on any DataNodes, fall back to the strategy 
without block cache (i.e. return the location of the nearest DataNode).

> DFSInputStream to prefer cached replica
> ---------------------------------------
>
>                 Key: HDFS-14283
>                 URL: https://issues.apache.org/jira/browse/HDFS-14283
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.6.0
>         Environment: HDFS Caching
>            Reporter: Wei-Chiu Chuang
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-14283.001.patch, HDFS-14283.002.patch
>
>
> HDFS Caching offers performance benefits. However, currently NameNode does 
> not treat cached replica with higher priority, so HDFS caching is only useful 
> when cache replication = 3, that is to say, all replicas are cached in 
> memory, so that a client doesn't randomly pick an uncached replica.
> HDFS-6846 proposed to let NameNode give higher priority to cached replica. 
> Changing a logic in NameNode is always tricky so that didn't get much 
> traction. Here I propose a different approach: let client (DFSInputStream) 
> prefer cached replica.
> A {{LocatedBlock}} object already contains cached replica location so a 
> client has the needed information. I think we can change 
> {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14283) DFSInputStream to prefer cached replica

Reply via email to