[ 
https://issues.apache.org/jira/browse/HBASE-9915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816911#comment-13816911
 ] 

Lars Hofhansl commented on HBASE-9915:
--------------------------------------

Some number with Phoenix. 5m rows, 5 long columns, 8 byte rowkeys, FAST_DIFF 
encoding, table fully flushed and major compacted, everything in the blockcache.
(some weirdly named columns, this was a preexisting table that I mapped into 
Phoenix - with CREATE TABLE).

||Query||Without Patch||With Patch||
|select count\(*) from "my5"|12.8s|9.7s|
|select count\(*) from "my5" where "3" = 1|23.5s|11.8s|
|select count\(*) from "my5" where "3" > 1|34.8s|15.6s|
|select avg("3") from "my5"|35.6s|17.4s|
|select avg("0"), avg("3") from "my5"|36.5s|20.2s|
|select avg("0"), avg("3") from "my5" where "4" = 1|31.8s|15.4s|
|select avg("0"), avg("3") from "my5" where "4" > 1|46.4s|25.1s|

Note that Phoenix adds a "fake" column to each row (so each row has a known KV 
for things like COUNT) and (almost) always uses the ExplicitColumnTracker.


> Severe performance bug: isSeeked() in EncodedScannerV2 is always false
> ----------------------------------------------------------------------
>
>                 Key: HBASE-9915
>                 URL: https://issues.apache.org/jira/browse/HBASE-9915
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.98.0, 0.96.1, 0.94.14
>
>         Attachments: 9915-0.94.txt, 9915-trunk-v2.txt, 9915-trunk.txt, 
> profile.png
>
>
> While debugging why reseek is so slow I found that it is quite broken for 
> encoded scanners.
> The problem is this:
> AbstractScannerV2.reseekTo(...) calls isSeeked() to check whether scanner was 
> seeked or not. If it was it checks whether the KV we want to seek to is in 
> the current block, if not it always consults the index blocks again.
> isSeeked checks the blockBuffer member, which is not used by EncodedScannerV2 
> and thus always returns false, which in turns causes an index lookup for each 
> reseek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to