[
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888180#comment-15888180
]
Eshcar Hillel commented on HBASE-16417:
---------------------------------------
To measure write amplification in our benchmark I'm trying to capture the total
size of data that is written to WAL during the experiment.
I do so by grep-ing log lines with both "filesize" and "wal" and adding the
values written after "filesize=".
I need help in explaining the numbers I get.
I run both in synchronous and asynchronous wal modes, and recall that I write
100GB in the write-only experiments.
(1) In sync mode I get roughly 200GB (!) that are written to wal, under all
in-memory compaction policies. In all cases we have 1673 times 121MB.
Is this reasonable?
Could it be due to double logging of the same information?
Should I expect only 100GB in wal?
Could it be due to alignment (my values are small -- 100B)?
Do you know of any duplication in wal processing?
Obviously I count only the sizes written to hdfs and not considering the 3-way
replication done at the data nodes level.
(2) In async mode I get different numbers NONE/BASIC - 189GB, EAGER - 124GB.
Here the sizes of the files vary, NONE/BASIC write roughly 850 files, EAGER
roughly 480.
Can you explain the difference in the data written to wal in sync mode vs async
mode with no compaction?
Could it be due to compression when writing batches of wal entries?
Can the reduced number of files written in EAGER mode can be explained by wal
truncation done after in-memory compaction?
I realize these are a lot of questions, any input can help here.
Thanks!!
> In-Memory MemStore Policy for Flattening and Compactions
> --------------------------------------------------------
>
> Key: HBASE-16417
> URL: https://issues.apache.org/jira/browse/HBASE-16417
> Project: HBase
> Issue Type: Sub-task
> Reporter: Anastasia Braginsky
> Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-16417-benchmarkresults-20161101.pdf,
> HBASE-16417-benchmarkresults-20161110.pdf,
> HBASE-16417-benchmarkresults-20161123.pdf,
> HBASE-16417-benchmarkresults-20161205.pdf
>
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)