[ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143698#comment-14143698
 ] 

Colin Patrick McCabe commented on HDFS-6581:
--------------------------------------------

bq. Rebase is no silver bullet. Conflicts still need to be resolved manually. 
Colin, explaining how to use git is a little condescending.

I apologize if it sounded condescending.  I was just trying to point out that 
the cost of maintaining a branch has gone down due to the switch to git.

bq. It would be very little code to convert the current FIFO approach to 
something like LFU but writing code is easy and demonstrating it actually helps 
HDFS clients is harder. For your caching feature the measurement was fairly 
straightforward. The logic for deciding which replicas need to be in memory was 
outside HDFS. For this feature we'd first need to define "better scheme" and 
we'd need help from other stack components for evaluation. Think of this 
feature as providing the overall framework including API, protocol changes and 
DN support. If there is no argument with the framework design then there should 
be no objection to doing the eviction fine tuning (which is a very small 
proportion of the patch, perhaps less than 5% content wise) post-merge. And to 
restate, we cannot get clients to start evaluating it until the changes are in 
mainline.

I agree that testing is needed, and it will be time-consuming.  But I don't 
understand why LRU was implemented first.  It's very well-known that LRU is a 
poor fit for scan workloads, which most HDFS workloads are.

My fear here is that we will try to implement a better eviction strategy, but 
find that the pluggable API introduced in HDFS-7100 is too inflexible to do so. 
 I'm hoping that this fear is not justified, but until there is an actual LFU 
or cold/warm/hot scheme implemented, we won't know for sure.  As you said, this 
isn't much code, so maybe I'll do it if it remains to be done later.

bq. Colin Patrick McCabe, I keep hearing usable eviction strategy and better 
eviction strategy. What is it? How do you decide it is better or usable? We 
should make sure the policy we go with is decent enough. I agree Fifo is not 
it. As regards to other approaches and improvements, one can certainly make it 
available using the plugin approach.

That's a good point.  I think system-level testing will be needed.  I think 
it's fine to merge without this system-level testing being done, but I want 
there to be at least one non-LRU implementation of eviction so that we know 
that it's possible within this framework.  Basically validating the plugin 
architecture.

bq. Micro-benchmark to verify that SCR performance does not suffer with this 
feature.

Thank you, Arpit.  You might also consider using:

{code}
sudo sh -c “/usr/bin/echo 3 > /proc/sys/vm/drop_caches”
time hadoop fs -cat /my/non-lazy-persist-file
sudo sh -c “/usr/bin/echo 3 > /proc/sys/vm/drop_caches”
time hadoop fs -cat /my/lazy-persist-file
{code}

to get a benchmark that makes you look better :)  Clearly the lazy-persist file 
will still be in RAM after caches are dropped, whereas the non-lazy one will 
not.  I always repeat experiments 3 times and average, I left that out for 
brevity

> Write to single replica in memory
> ---------------------------------
>
>                 Key: HDFS-6581
>                 URL: https://issues.apache.org/jira/browse/HDFS-6581
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, 
> HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, 
> HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, 
> HDFS-6581.merge.09.patch, HDFSWriteableReplicasInMemory.pdf, 
> Test-Plan-for-HDFS-6581-Memory-Storage.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to 
> a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can 
> revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to