[ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888180#comment-15888180
 ] 

Eshcar Hillel commented on HBASE-16417:
---------------------------------------

To measure write amplification in our benchmark I'm trying to capture the total 
size of data that is written to WAL during the experiment.
I do so by grep-ing log lines with both "filesize" and "wal" and adding the 
values written after "filesize=".

I need help in explaining the numbers I get.

I run both in synchronous and asynchronous wal modes, and recall that I write 
100GB in the write-only experiments.
(1) In sync mode I get roughly 200GB (!) that are written to wal, under all 
in-memory compaction policies. In all cases we have 1673 times 121MB.
Is this reasonable? 
Could it be due to double logging of the same information?
Should I expect only 100GB in wal? 
Could it be due to alignment (my values are small -- 100B)? 
Do you know of any duplication in wal processing? 
Obviously I count only the sizes written to hdfs and not considering the 3-way 
replication done at the data nodes level.
  
(2) In async mode I get different numbers NONE/BASIC - 189GB, EAGER - 124GB.
Here the sizes of the files vary, NONE/BASIC write roughly 850  files, EAGER 
roughly 480.
Can you explain the difference in the data written to wal in sync mode vs async 
mode with no compaction?
Could it be due to compression when writing batches of wal entries?
Can the reduced number of files written in EAGER mode can be explained  by wal 
truncation done after in-memory compaction?

I realize these are a lot of questions, any input can help here.
Thanks!! 

> In-Memory MemStore Policy for Flattening and Compactions
> --------------------------------------------------------
>
>                 Key: HBASE-16417
>                 URL: https://issues.apache.org/jira/browse/HBASE-16417
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Anastasia Braginsky
>            Assignee: Eshcar Hillel
>             Fix For: 2.0.0
>
>         Attachments: HBASE-16417-benchmarkresults-20161101.pdf, 
> HBASE-16417-benchmarkresults-20161110.pdf, 
> HBASE-16417-benchmarkresults-20161123.pdf, 
> HBASE-16417-benchmarkresults-20161205.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to