[ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106267#comment-14106267
 ] 

Colin Patrick McCabe commented on HDFS-6581:
--------------------------------------------

The key difference between tmpfs and ramfs is that unprivileged users can't be 
allowed write access to ramfs, since you can trivially fill up the entire 
memory by writing to ramfs.  tmpfs has a kernel-enforced size limit, and 
swapping.  Since the design outlined here doesn't require giving unprivileged 
users write access to the temporary area, it is compatible with *both* tmpfs 
and ramfs.

bq. I do prefer tmpfs as the OS limits tmpfs usage beyond the configured size 
so the failure case is safer (DiskOutOfSpace instead of exhaust all RAM). swap 
is not as much of a concern as it is usually disabled.

I can think of two cases where we might run out of memory:
1. The user configures the DN to use so much memory for cache that there is not 
enough memory to run other programs.

ramfs: causes applications to be aborted with OOM errors.
tmpfs: degrades performance to very slow levels by swapping out our "cached" 
files.

An OOM error is easy to diagnose.  Sluggish performance is not.  The ramfs 
behavior is better than the tmpfs behavior.

2. There is a bug in the DataNode causing it to try to cache more than it 
should.

ramfs: causes applications to be aborted with OOM errors.
tmpfs: degrades performance to very slow levels by swapping out our "cached" 
files.

The bug is easy to find when using ramfs, hard to find with tmpfs.

So I would say, tmpfs is always worse for us.  Swapping is just not something 
we ever want, and memory limits are something we enforce ourselves, so tmpfs's 
features don't help us.

bq. Agreed that plain LRU would be a poor choice. Perhaps a hybrid of MRU+LRU 
would be a good option. i.e. evict the most recently read replica, unless there 
are replicas older than some threshold, in which case evict the LRU one. The 
assumption being that a client is unlikely to reread from a recently read 
replica.

Yeah, we'll need some benchmarking on this probably.

bq. Yes I reviewed the former, it looks interesting with eviction in mind. I'll 
create a subtask to investigate eviction via truncate.

Yeah, thanks for the review on HDFS-6750.  As Todd pointed out, we probably 
want to give clients some warning before the truncate in HDFS-6581, just like 
we do with HDFS-4949 and the munlock...

bq. The DataNode does not create the RAM disk since we cannot require root. An 
administrator will have to configure the partition.

Yeah, that makes sense.  Similarly, for HDFS-4949, the administrator must set 
the ulimit for the DataNode before caching can work.



> Write to single replica in memory
> ---------------------------------
>
>                 Key: HDFS-6581
>                 URL: https://issues.apache.org/jira/browse/HDFS-6581
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: HDFSWriteableReplicasInMemory.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to 
> a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can 
> revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to