[
https://issues.apache.org/jira/browse/HBASE-21520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719936#comment-16719936
]
Zheng Hu commented on HBASE-21520:
----------------------------------
Add some log to show the time cost in ROWCOL case. See the attached file, for
row-col case, cost about 40~80ms to scan over the table. but for ROW case,
cost about 8ms.
Firstly, i am wonder that only when open scanner we need the bloom filter, so
why the ROWCOL slow down the speed. After add some log, I found that the
following stack will also caculate the bloom filter value. I guess here is the
problem.
{code}
===> useBloom in requestSeek: true
java.lang.Thread.getStackTrace(Thread.java:1552)
org.apache.hadoop.hbase.regionserver.StoreFileScanner.requestSeek(StoreFileScanner.java:398)
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:318)
org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:275)
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:989)
org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:980)
org.apache.hadoop.hbase.regionserver.StoreScanner.seekOrSkipToNextColumn(StoreScanner.java:749)
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:637)
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:153)
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:6597)
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6761)
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6534)
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:6511)
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:6498)
{code}
> TestMultiColumnScanner cost long time when using ROWCOL bloom type
> ------------------------------------------------------------------
>
> Key: HBASE-21520
> URL: https://issues.apache.org/jira/browse/HBASE-21520
> Project: HBase
> Issue Type: Bug
> Components: test
> Reporter: Zheng Hu
> Assignee: Zheng Hu
> Priority: Major
> Attachments: rowcol.txt
>
>
> The TestMultiColumnScanner is easy to be timeout, you can see HBASE-21517.
> In my localhost, when I set the parameters to be {
> Compression.Algorithm.NONE, BloomType.ROW, false }, it took about 5 seconds.
> but if I set the parameters to be { Compression.Algorithm.NONE,
> BloomType.ROWCOL, false }, it would take about 45 seconds, which means
> ROWCOL cost much more time than ROW.
> Need to find out what's wrong with this ut.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)