[
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106267#comment-14106267
]
Colin Patrick McCabe commented on HDFS-6581:
--------------------------------------------
The key difference between tmpfs and ramfs is that unprivileged users can't be
allowed write access to ramfs, since you can trivially fill up the entire
memory by writing to ramfs. tmpfs has a kernel-enforced size limit, and
swapping. Since the design outlined here doesn't require giving unprivileged
users write access to the temporary area, it is compatible with *both* tmpfs
and ramfs.
bq. I do prefer tmpfs as the OS limits tmpfs usage beyond the configured size
so the failure case is safer (DiskOutOfSpace instead of exhaust all RAM). swap
is not as much of a concern as it is usually disabled.
I can think of two cases where we might run out of memory:
1. The user configures the DN to use so much memory for cache that there is not
enough memory to run other programs.
ramfs: causes applications to be aborted with OOM errors.
tmpfs: degrades performance to very slow levels by swapping out our "cached"
files.
An OOM error is easy to diagnose. Sluggish performance is not. The ramfs
behavior is better than the tmpfs behavior.
2. There is a bug in the DataNode causing it to try to cache more than it
should.
ramfs: causes applications to be aborted with OOM errors.
tmpfs: degrades performance to very slow levels by swapping out our "cached"
files.
The bug is easy to find when using ramfs, hard to find with tmpfs.
So I would say, tmpfs is always worse for us. Swapping is just not something
we ever want, and memory limits are something we enforce ourselves, so tmpfs's
features don't help us.
bq. Agreed that plain LRU would be a poor choice. Perhaps a hybrid of MRU+LRU
would be a good option. i.e. evict the most recently read replica, unless there
are replicas older than some threshold, in which case evict the LRU one. The
assumption being that a client is unlikely to reread from a recently read
replica.
Yeah, we'll need some benchmarking on this probably.
bq. Yes I reviewed the former, it looks interesting with eviction in mind. I'll
create a subtask to investigate eviction via truncate.
Yeah, thanks for the review on HDFS-6750. As Todd pointed out, we probably
want to give clients some warning before the truncate in HDFS-6581, just like
we do with HDFS-4949 and the munlock...
bq. The DataNode does not create the RAM disk since we cannot require root. An
administrator will have to configure the partition.
Yeah, that makes sense. Similarly, for HDFS-4949, the administrator must set
the ulimit for the DataNode before caching can work.
> Write to single replica in memory
> ---------------------------------
>
> Key: HDFS-6581
> URL: https://issues.apache.org/jira/browse/HDFS-6581
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Reporter: Arpit Agarwal
> Assignee: Arpit Agarwal
> Attachments: HDFSWriteableReplicasInMemory.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to
> a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can
> revisit at a later time.
--
This message was sent by Atlassian JIRA
(v6.2#6252)