[jira] [Commented] (HDFS-6581) Write to single replica in memory

Arpit Agarwal (JIRA) Tue, 23 Sep 2014 16:19:02 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145599#comment-14145599
 ]


Arpit Agarwal commented on HDFS-6581:
-------------------------------------

Preliminary numbers for write throughput.

The test creates, writes and closes 3 x 2GB files in quick succession and 
computes the mean E2E time per file. Just looking at raw throughput  makes 
memory writes look even better.

System RAM: 24GB
RAM Disk: 8GB

*Baseline, checksums ON*
||Block Size (MB)||Mean E2E Latency (ms)||
|128|7235|
|1024|7005|

*Lazy Persist, checksums ON*
||Block Size (MB)||Mean E2E Latency (ms)||Improvement over baseline||
|128|5015|30.6%|
|1024|4635|33.8%|

*Lazy Persist, checksums OFF*
||Block Size (MB)||Mean E2E Latency (ms)||Improvement over baseline||
|128|4504|37.7%|
|1024|4240|39.4%|

The baseline times were all over the map across runs. I picked the best number. 
If the buffer cache happens to be dirty - which will be common in practice - 
the disk write times degrade to 20s for a 2GB file (100MB/s, which happens to 
be disk write throughput). Correspondingly if the RAM disk is full with dirty 
data and the lazy writer cannot keep up the memory numbers will suffer. Another 
potential improvement afforded by writing to RAM disk is that the lazyWriter 
can use unbuffered disk writes which avoid churning buffer cache (HDFS-7090). 
We cannot make a corresponding fix in our existing data write pipeline as the 
best case write latency will suffer significantly.

> Write to single replica in memory
> ---------------------------------
>
>                 Key: HDFS-6581
>                 URL: https://issues.apache.org/jira/browse/HDFS-6581
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, 
> HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, 
> HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, 
> HDFS-6581.merge.09.patch, HDFS-6581.merge.10.patch, 
> HDFSWriteableReplicasInMemory.pdf, Test-Plan-for-HDFS-6581-Memory-Storage.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to 
> a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can 
> revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6581) Write to single replica in memory

Reply via email to