[jira] [Commented] (HDFS-6581) Write to single replica in memory

Colin Patrick McCabe (JIRA) Thu, 21 Aug 2014 15:00:34 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106025#comment-14106025
 ]


Colin Patrick McCabe commented on HDFS-6581:
--------------------------------------------

Looks good overall.  It's good to see progress on this.

Some comments about the design doc:

* Why not use ramfs instead of tmpfs?  ramfs can't swap.

** The problem with using tmpfs is that the system could move the data to swap 
at any time.  In addition to performance problems, this could cause correctness 
problems later when we read back the data from swap (i.e. from the hard disk).  
Since we don't want to verify checksums here, we should use a storage method 
that we know never touches the disk.  Tachyon uses ramfs instead of tmpfs for 
this reason.

* An LRU replacement policy isn't a good choice.  It's very easy for a batch 
job to kick out everything in memory before it can ever be used again 
(thrashing).  An LFU (least frequently used) policy would be much better.  We'd 
have to keep usage statistics to implement this, but that doesn't seem too bad.

* How is the maximum tmpfs/ramfs size per datanode configured?  I think we 
should use the existing {{dfs.datanode.max.locked.memory}} property to 
configure this, for consistency.  System administrators should not need to 
configure separate pools of memory for HDFS-4949 and this feature.  It should 
be one memory size.

** I also think that cache directives from HDFS-4949 should take precedence 
over this opportunistic write caching.  If we need to evict some HDFS-5851 
cache items to finish our HDFS-4949 caching, we should do that.

* Related to that, we might want to rename {{dfs.datanode.max.locked.memory}} 
to {{dfs.data.node.max.cache.memory}} or something.

* You can effectively revoke access to a block file stored in ramfs or tmpfs by 
truncating that file to 0 bytes.  The client can hang on to the file 
descriptor, but this doesn't keep any data bytes in memory.  So we can move 
things out of the cache even if the clients are unresponsive.  Also see 
HDFS-6750 and HDFS-6036 for examples of how we can ask the clients to stop 
using a short-circuit replica before tearing it down.

> Write to single replica in memory
> ---------------------------------
>
>                 Key: HDFS-6581
>                 URL: https://issues.apache.org/jira/browse/HDFS-6581
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: HDFSWriteableReplicasInMemory.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to 
> a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can 
> revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6581) Write to single replica in memory

Reply via email to