[
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655444#comment-15655444
]
Eshcar Hillel commented on HBASE-16417:
---------------------------------------
While running the benchmarks this week I realized I did a mistake when running
data compaction in previous rounds. I turned off the mslab flag but did not
remove the chunk pool parameters and as a result a chunk pool was allocated but
not used. I re-ran these experiments this week with no mslabs and no chunk pool
and indeed the performance improved. For a fair comparison I also ran
no-compaction option with no mslabs and no chunk pool which turned out to be
the best performing setting. (See full details in the latest report.)
The focus of this week benchmarks was mixed-workload: 50% reads 50% writes.
Results show that in a mixed workload running with no mslabs and no chunk pool
has a significant advantage over running with chunk pool and mslabs. This is
the case when running with no compaction or with data compaction.
So far benchmarks do not show advantage of index-/data-compaction over
no-compaction. This might be due to several reasons:
1. Running index-/data-compaction should reduce the amount of disk compactions
- the price tag of running a disk compaction in the current system (single ssd
machine) is not as high as it would be in a production cluster.
2. Index compaction would have greater affect as the size of the cells
decreases - the values we are using now are medium size (1KB) and not small.
3. Index-/data-compaction should result in more reads being served from memory
thereby reducing reads latency - we might be using too small a data set which
is efficiently served from block cache; this is not always the case in
production data sets.
4. Index-/data-compaction should result in more reads being served from memory
thereby reducing reads latency - the current implementation of reads *always*
seeks the key in all store files that may contain it even if it resides in
memory, effectively masking any memory optimization including in-memory
compaction.
Directions we intend to explore next:
1. Run benchmarks on commodity machines (namely HDD and not SSD); run cluster
on more than one machine (2 RS, 3-way replication); the scale might be smaller
though since our HDD machine are modest compared to the ssd machine we have.
2. Run with smaller values - 100B instead of 1KB
3. Run bigger data sets - 10-20M keys instead of 5M keys
4. Change read (get) implementation to first seek for the key in memstore(s)
only, and only if no matching entry is found seek in all memstore segments and
all relevant store files. This could be a subject of another Jira. We believe
this would be beneficial also with no compaction, and even more when
index-/data-compaction is employed. Any thought on this direction(?)
Finally a small note: a small bug was found which does not allow
index-compaction to run without mslabs. This bug is about to be fixed in a new
patch Anastasia is working on.
> In-Memory MemStore Policy for Flattening and Compactions
> --------------------------------------------------------
>
> Key: HBASE-16417
> URL: https://issues.apache.org/jira/browse/HBASE-16417
> Project: HBase
> Issue Type: Sub-task
> Reporter: Anastasia Braginsky
> Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-16417-benchmarkresults-20161101.pdf,
> HBASE-16417-benchmarkresults-20161110.pdf
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)