[ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106245#comment-14106245
 ] 

Andrew Wang commented on HDFS-6581:
-----------------------------------

I also took a look at the doc, seems pretty reasonable. Had a few questions as 
well.

* Related to Colin's point about configuring separate pools of memory on the 
DN, I'd really like to see integration with the cache pools from HDFS-4949. 
Memory is ideally shareable between HDFS and YARN, and cache pools were 
designed with that in mind. Simple storage quotas do not fit as well.
* Quotas are also a very rigid policy and can result in under-utilization. 
Cache pools are more flexible, and can be extended to support fair share and 
more complex policies. Avoiding underutilization seems especially important for 
a limited resource like memory.
* Do you have any benchmarks? For the read side, we found checksum overhead to 
be substantial, essentially the cost of a copy. If we use tmpfs, it can swap, 
so we're forced to calculate checksums at both write and read time. My guess is 
also that a normal 1-replication write will be fairly fast because of the OS 
buffer cache, so it'd be nice to quantify the potential improvement.
* There's a mention of LAZY_PERSIST having a config option to unlink corrupt 
TMP files. It seems better for this to be per-file rather than NN-wide, since 
different clients might want different behavior.
* 5.2.2 lists a con of mmaped files as not having control over page writeback. 
Is this actually true when using mlock? Also not sure why memory pressure is 
worse with mmaped files compared to tmpfs. mmap might make eviction+SCR nicer 
too, since you can just drop the mlocks if you want to evict, and the client 
has a hope of falling back gracefully.

HSM-related questions
* Caveat, I'm not sure what the HSM APIs will look like, or how this will be 
integrated, so some of these might be out of scope.
* Will we support changing a file from DISK storage type to TMP storage type? I 
would say no, since cache directives seem better for read caching when 
something is already on disk.
* Will we support writing a file on both TMP and another storage type? Similar 
to the above, it also doesn't feel that useful.

> Write to single replica in memory
> ---------------------------------
>
>                 Key: HDFS-6581
>                 URL: https://issues.apache.org/jira/browse/HDFS-6581
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: HDFSWriteableReplicasInMemory.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to 
> a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can 
> revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to