[
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106232#comment-14106232
]
Arpit Agarwal commented on HDFS-6581:
-------------------------------------
Thank you for taking the time to look at the doc and provide feedback.
bq. The problem with using tmpfs is that the system could move the data to swap
at any time. In addition to performance problems, this could cause correctness
problems later when we read back the data from swap (i.e. from the hard disk).
Since we don't want to verify checksums here, we should use a storage method
that we know never touches the disk. Tachyon uses ramfs instead of tmpfs for
this reason.
The implementation makes no assumptions of the underlying platform, whether it
is tmpfs or ramfs. I think renaming TMPFS to RAM as Gopal suggested will avoid
confusion. I do prefer tmpfs as the OS limits tmpfs usage beyond the configured
size so the failure case is safer (DiskOutOfSpace instead of exhaust all RAM).
swap is not as much of a concern as it is usually disabled.
bq. An LRU replacement policy isn't a good choice. It's very easy for a batch
job to kick out everything in memory before it can ever be used again
(thrashing). An LFU (least frequently used) policy would be much better. We'd
have to keep usage statistics to implement this, but that doesn't seem too bad.
Agreed that plain LRU would be a poor choice. Perhaps a hybrid of MRU+LRU would
be a good option. i.e. evict the most recently read replica, unless there are
replicas older than some threshold, in which case evict the LRU one. The
assumption being that a client is unlikely to reread from a recently read
replica.
bq. You can effectively revoke access to a block file stored in ramfs or tmpfs
by truncating that file to 0 bytes. The client can hang on to the file
descriptor, but this doesn't keep any data bytes in memory. So we can move
things out of the cache even if the clients are unresponsive. Also see
HDFS-6750 and HDFS-6036 for examples of how we can ask the clients to stop
using a short-circuit replica before tearing it down.
Yes I reviewed the former, it looks interesting with eviction in mind. I'll
create a subtask to investigate eviction via truncate.
bq. How is the maximum tmpfs/ramfs size per datanode configured? I think we
should use the existing dfs.datanode.max.locked.memory property to configure
this, for consistency. System administrators should not need to configure
separate pools of memory for HDFS-4949 and this feature. It should be one
memory size.
bq. Related to that, we might want to rename dfs.datanode.max.locked.memory to
dfs.data.node.max.cache.memory or something.
The DataNode does not create the RAM disk since we cannot require root. An
administrator will have to configure the partition.
> Write to single replica in memory
> ---------------------------------
>
> Key: HDFS-6581
> URL: https://issues.apache.org/jira/browse/HDFS-6581
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Reporter: Arpit Agarwal
> Assignee: Arpit Agarwal
> Attachments: HDFSWriteableReplicasInMemory.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to
> a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can
> revisit at a later time.
--
This message was sent by Atlassian JIRA
(v6.2#6252)