[ 
https://issues.apache.org/jira/browse/HBASE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384273#comment-14384273
 ] 

Lars Hofhansl edited comment on HBASE-13291 at 3/27/15 6:13 PM:
----------------------------------------------------------------

Ah. Hence the seeking. In that case the optimize won't help, but also still 
won't do any work for those (it's only does anything for SEEK_NEXT_COL and 
SEEK_NEXT_ROW).

But since you see optimize being called and doing some work there must be many 
COL or ROW seeks (ExplicitColumnTracker probably, at least seeking to the next 
row), and I'd expect it to be worse without HBASE-13109 applied, then again 
with large cells there won't be much advantage to skipping a lot before we'd 
hit the next block (at most 64 cells would fit into a 64k block)... Would be 
cool to know what perf you'd see without HBASE-13109.

Is this a common usecase? We'd have essentially an HFile block per row here, 
right?

I'll also test with wider rows for this issue... Very curious about this!



was (Author: lhofhansl):
Ah. Hence the seeking. In that case the optimize won't help, but also still 
won't do any work for those (it's only does anything for SEEK_NEXT_COL and 
SEEK_NEXT_ROW).

But since you see optimize being called and doing some work there must be many 
COL or ROW seeks, and I'd expect it to be worse without HBASE-13109 applied, 
then again with large cells there won't be much advantage to skipping a lot 
before we'd hit the next block (at most 64 cells would fit into a 64k block)... 
Would be cool to know what perf you'd see without HBASE-13109.

Is this a common usecase? We'd have essentially an HFile block per row here, 
right?

I'll also test with wider rows for this issue... Very curious about this!


> Lift the scan ceiling
> ---------------------
>
>                 Key: HBASE-13291
>                 URL: https://issues.apache.org/jira/browse/HBASE-13291
>             Project: HBase
>          Issue Type: Improvement
>          Components: Scanners
>    Affects Versions: 1.0.0
>            Reporter: stack
>            Assignee: stack
>         Attachments: 13291.inlining.txt, Screen Shot 2015-03-26 at 12.12.13 
> PM.png, Screen Shot 2015-03-26 at 3.39.33 PM.png, hack_to_bypass_bb.txt, 
> nonBBposAndInineMvccVint.txt, q (1).png, traces.7.svg, traces.filterall.svg, 
> traces.nofilter.svg, traces.small2.svg, traces.smaller.svg
>
>
> Scanning medium sized rows with multiple concurrent scanners exhibits 
> interesting 'ceiling' properties. A server runs at about 6.7k ops a second 
> using 450% of possible 1600% of CPUs  when 4 clients each with 10 threads 
> doing scan 1000 rows.  If I add '--filterAll' argument (do not return 
> results), then we run at 1450% of possible 1600% possible but we do 8k ops a 
> second.
> Let me attach flame graphs for two cases. Unfortunately, there is some 
> frustrating dark art going on. Let me try figure it... Filing issue in 
> meantime to keep score in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to