[ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106232#comment-14106232
 ] 

Arpit Agarwal commented on HDFS-6581:
-------------------------------------

Thank you for taking the time to look at the doc and provide feedback.

bq. The problem with using tmpfs is that the system could move the data to swap 
at any time. In addition to performance problems, this could cause correctness 
problems later when we read back the data from swap (i.e. from the hard disk). 
Since we don't want to verify checksums here, we should use a storage method 
that we know never touches the disk. Tachyon uses ramfs instead of tmpfs for 
this reason.
The implementation makes no assumptions of the underlying platform, whether it 
is tmpfs or ramfs. I think renaming TMPFS to RAM as Gopal suggested will avoid 
confusion. I do prefer tmpfs as the OS limits tmpfs usage beyond the configured 
size so the failure case is safer (DiskOutOfSpace instead of exhaust all RAM). 
swap is not as much of a concern as it is usually disabled.

bq. An LRU replacement policy isn't a good choice. It's very easy for a batch 
job to kick out everything in memory before it can ever be used again 
(thrashing). An LFU (least frequently used) policy would be much better. We'd 
have to keep usage statistics to implement this, but that doesn't seem too bad.
Agreed that plain LRU would be a poor choice. Perhaps a hybrid of MRU+LRU would 
be a good option. i.e. evict the most recently read replica, unless there are 
replicas older than some threshold, in which case evict the LRU one. The 
assumption being that a client is unlikely to reread from a recently read 
replica.

bq. You can effectively revoke access to a block file stored in ramfs or tmpfs 
by truncating that file to 0 bytes. The client can hang on to the file 
descriptor, but this doesn't keep any data bytes in memory. So we can move 
things out of the cache even if the clients are unresponsive. Also see 
HDFS-6750 and HDFS-6036 for examples of how we can ask the clients to stop 
using a short-circuit replica before tearing it down.
Yes I reviewed the former, it looks interesting with eviction in mind. I'll 
create a subtask to investigate eviction via truncate.

bq. How is the maximum tmpfs/ramfs size per datanode configured? I think we 
should use the existing dfs.datanode.max.locked.memory property to configure 
this, for consistency. System administrators should not need to configure 
separate pools of memory for HDFS-4949 and this feature. It should be one 
memory size.
bq. Related to that, we might want to rename dfs.datanode.max.locked.memory to 
dfs.data.node.max.cache.memory or something.
The DataNode does not create the RAM disk since we cannot require root. An 
administrator will have to configure the partition.

> Write to single replica in memory
> ---------------------------------
>
>                 Key: HDFS-6581
>                 URL: https://issues.apache.org/jira/browse/HDFS-6581
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: HDFSWriteableReplicasInMemory.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to 
> a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can 
> revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to