[
https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555951#comment-13555951
]
liang xie commented on HBASE-7495:
----------------------------------
I just did a apple-to-apple comparison this morning, it shows the parallel seek
reduces latency in special scenario.
Attached is a prelim patch just for refer.
My test env : 10 dn/rs each with 12*2T SATA, "hfile.block.cache.size=0",
hbase0.94.3, cdh4.1.1
My test data :
recordcount=1000000000
fieldcount=3
fieldlength=200
hbase(main):002:0> describe 'YCSBTest'
DESCRIPTION ENABLED
{NAME => 'YCSBTest', SPLIT_POLICY =>
'org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy', FAMILIES
=> [{NAME => 'te true
st', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>
'1', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VER
SIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE =>
'65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'tru
e', BLOCKCACHE => 'true'}]}
$./hdfs dfs -du -s -h hdfs://lgxl-xieliang/
726.8g hdfs://lgxl-xieliang/
100 regions in total, and most of numberOfStorefiles in those regions are [0,5]
My test cmd: bin/ycsb run hbase -P ./workloads/kaka -threads 1 -p
columnfamily=test -p table=YCSBTest -s > log/run.log 2>&1 &
I restarted the whole hbase/hdfs cluster and clear OS cache(echo 1 >
/proc/sys/vm/drop_caches) before each run.
Serial seek result:
[OVERALL], RunTime(ms), 300027.0
[OVERALL], Throughput(ops/sec), 20.09819116279535
[READ], Operations, 6030
[READ], AverageLatency(us), 49739.97446102819
[READ], MinLatency(us), 2768
[READ], MaxLatency(us), 782892
[READ], 50thPercentileLatency(ms), 45
[READ], 95thPercentileLatency(ms), 90
[READ], 99thPercentileLatency(ms), 124
[READ], Return=0, 6030
Parallel seek result:
[OVERALL], RunTime(ms), 300016.0
[OVERALL], Throughput(ops/sec), 39.584555490373845
[READ], Operations, 11876
[READ], AverageLatency(us), 25249.878410239136
[READ], MinLatency(us), 3084
[READ], MaxLatency(us), 753547
[READ], 50thPercentileLatency(ms), 22
[READ], 95thPercentileLatency(ms), 43
[READ], 99thPercentileLatency(ms), 67
[READ], Return=0, 11876
> parallel scanner seek in StoreScanner's constructor
> ---------------------------------------------------
>
> Key: HBASE-7495
> URL: https://issues.apache.org/jira/browse/HBASE-7495
> Project: HBase
> Issue Type: Bug
> Components: Scanners
> Affects Versions: 0.94.3, 0.96.0
> Reporter: liang xie
> Assignee: liang xie
> Attachments: HBASE-7495.txt
>
>
> seems there's a potential improvable space before doing scanner.next:
> {code:title=StoreScanner.java|borderStyle=solid}
> if (explicitColumnQuery && lazySeekEnabledGlobally) {
> for (KeyValueScanner scanner : scanners) {
> scanner.requestSeek(matcher.getStartKey(), false, true);
> }
> } else {
> for (KeyValueScanner scanner : scanners) {
> scanner.seek(matcher.getStartKey());
> }
> }
> {code}
> we can do scanner.requestSeek or scanner.seek in parallel, instead of current
> serialization, to reduce latency for special case.
> Any ideas on it ? I'll have a try if the comments/suggestions are positive:)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira