[jira] [Commented] (HDFS-6581) Write to single replica in memory

Arpit Agarwal (JIRA) Mon, 22 Sep 2014 13:42:07 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143773#comment-14143773
 ]


Arpit Agarwal commented on HDFS-6581:
-------------------------------------

bq. My fear here is that we will try to implement a better eviction strategy, 
but find that the pluggable API introduced in HDFS-7100 is too inflexible to do 
so. I'm hoping that this fear is not justified, but until there is an actual 
LFU or cold/warm/hot scheme implemented, we won't know for sure. As you said, 
this isn't much code, so maybe I'll do it if it remains to be done later.
Colin, LFU may work better for a general purpose cache, but this feature is 
targeting a specific use case of smaller intermediate data. Intermediate data 
is likely to be read once or very few times and is very likely to not fit the 
typical LFU use case and in fact NFU may be better. IMO without real world 
evaluation there is no data to support one over the other. Let's help HDFS 
clients evaluate it.

bq. My fear here is that we will try to implement a better eviction strategy, 
but find that the pluggable API introduced in HDFS-7100 is too inflexible to do 
so.
I don't see any reason to fear. The interface is tagged private and the 
interactions with DN are in limited portions of the FsDataset code. It will be 
easy to update if needed.

bq. to get a benchmark that makes you look better  Clearly the lazy-persist 
file will still be in RAM after caches are dropped, whereas the non-lazy one 
will not. I always repeat experiments 3 times and average, I left that out for 
brevity
Thanks for the idea, might be useful for future testing. For now I trigger the 
best case scenario for non-lazy persist (data already in buffer cache) just to 
demonstrate performance is at par. As we'd expect it to be since we're doing 
SCR from RAM in either case. The numbers are means over 1000 runs discarding 
the initial sacrificial read fetching block data to buffer cache.

> Write to single replica in memory
> ---------------------------------
>
>                 Key: HDFS-6581
>                 URL: https://issues.apache.org/jira/browse/HDFS-6581
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, 
> HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, 
> HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, 
> HDFS-6581.merge.09.patch, HDFSWriteableReplicasInMemory.pdf, 
> Test-Plan-for-HDFS-6581-Memory-Storage.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to 
> a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can 
> revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6581) Write to single replica in memory

Reply via email to