[
https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828439#comment-13828439
]
Vladimir Rodionov commented on HBASE-10015:
-------------------------------------------
May be I am wrong (empty synchronized method call cost on my laptop is 25 ns)
but my own tests on StoreScanner show 0 improvement.
Code is simple:
create region, populate with data (make sure data is in a cache) , then
{code}
LOG.info("Test store scanner");
Scan scan = new Scan();
scan.setStartRow(region.getStartKey());
scan.setStopRow(region.getEndKey());
Store store = region.getStore(CF);
StoreScanner scanner = new StoreScanner(store, store.getScanInfo(),
scan, null);
long start = System.currentTimeMillis();
int total = 0;
List<KeyValue> result = new ArrayList<KeyValue>();
while(scanner.next(result)){
total++; result.clear();
}
LOG.info("Test store scanner finished. Found "+total +" in
"+(System.currentTimeMillis() - start)+"ms");
{code}
This test shows exact the same time for both: default StoreScanner and
*unsynchronized* StoreScanner. The scan is not very fast: 1-1.5M rows per sec
(rows are relatively small: 1 CF + 5 CQ, ~ 120 bytes )
> Major performance improvement: Avoid synchronization in StoreScanner
> --------------------------------------------------------------------
>
> Key: HBASE-10015
> URL: https://issues.apache.org/jira/browse/HBASE-10015
> Project: HBase
> Issue Type: Bug
> Reporter: Lars Hofhansl
> Attachments: 10015-0.94.txt, TestLoad.java
>
>
> Did some more profiling (this time with a sampling profiler) and
> StoreScanner.peek() showed up a lot in the samples. At first that was
> surprising, but peek is synchronized, so it seems a lot of the sync'ing cost
> is eaten there.
> It seems the only reason we have to synchronize all these methods is because
> a concurrent flush or compaction can change the scanner stack, other than
> that only a single thread should access a StoreScanner at any given time.
> So replaced updateReaders() with some code that just indicates to the scanner
> that the readers should be updated and then make it the using thread's
> responsibility to do the work.
> The perf improvement from this is staggering. I am seeing somewhere around 3x
> scan performance improvement across all scenarios.
> Now, the hard part is to reason about whether this is 100% correct. I ran
> TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still
> pass.
> Will attach a sample patch.
--
This message was sent by Atlassian JIRA
(v6.1#6144)