[ 
https://issues.apache.org/jira/browse/HDFS-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148823#comment-14148823
 ] 

Arpit Agarwal commented on HDFS-7142:
-------------------------------------

Thanks for writing this up. Feedback below.


In {{getReplica}}
{code}
  private RamDiskReplica2Q getReplica(final String bpid, final long blockId) {
    RamDiskReplica2Q exemplar = new RamDiskReplica2Q(bpid, blockId,
        null, 0L, PersistenceTier.NUM_IN_MEMORY_TIERS);
    RamDiskReplica2Q replica = replicasSortedByBlockPoolAndId.ceiling(exemplar);
{code}

The call to {{ceiling}} need not return the exact match. If you get a non-null 
result you need a check for blockId and bpid match. Perhaps I misunderstood the 
intention.

{{dequeueNextReplicaToPersist}} appears to have a starvation issue. If replicas 
in a higher blockPoolId keep getting added constantly a replica in a lower bpid 
may wait indefinitely to get persisted. It would be good to persist replicas in 
the same order in which they were originally added, you can do that with an 
additional set.

{{dequeueNextReplicaToPersist}} should not remove the replica from 
{{replicasSortedByTierAndLastUsed}}. Replicas that are persisted can still stay 
in memory indefinitely if there is free RAM or if the replica is being used. 
The additional set to track persistence can help. Same issue with 
{{numReplicasNotPersisted}}, it should not count all replicas in RAM. Let me 
clarify the documentation on 
{{RamDiskReplicaTracker.dequeueNextReplicaToPersist}}.




> Implement a 2Q eviction strategy for HDFS-6581
> ----------------------------------------------
>
>                 Key: HDFS-7142
>                 URL: https://issues.apache.org/jira/browse/HDFS-7142
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: 0002-Add-RamDiskReplica2QTracker.patch
>
>
> We should implement a 2Q or approximate 2Q eviction strategy for HDFS-6581.  
> It is well known that LRU is a poor fit for scanning workloads, which HDFS 
> may often encounter. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to