Great investigation Andy.  Do you know any Jiras which made changes in SQM?
Would be great if you can attach your patch which tracks the scan flow.  If
we have a Jira for this issue, can you pls attach?

Anoop

On Fri, Jun 26, 2020 at 1:56 AM Andrew Purtell <andrew.purt...@gmail.com>
wrote:

> Related, I think I found a bug in branch-1 where we don’t heartbeat in the
> filter all case until we switch store files, so scanning a very large store
> file might time out with client defaults. Remarking on this here so I don’t
> forget to follow up.
>
> > On Jun 25, 2020, at 12:27 PM, Andrew Purtell <apurt...@apache.org>
> wrote:
> >
> > 
> > I repeated this test with pe --filterAll and the results were revealing,
> at least for this case. I also patched in thread local hash map for atomic
> counters that I could update from code paths in SQM, StoreScanner,
> HFileReader*, and HFileBlock. Because a RPC is processed by a single
> handler thread I could update counters and accumulate micro-timings via
> System#nanoTime() per RPC and dump them out of CallRunner in some new trace
> logging. I spent a couple of days making sure the instrumentation was
> placed equivalently in both 1.6 and 2.2 code bases and was producing
> consistent results. I can provide these patches upon request.
> >
> > Again, test tables with one family and 1, 5, 10, 20, 50, and 100
> distinct column-qualifiers per row. After loading the table I made a
> snapshot and cloned the snapshot for testing, for both 1.6 and 2.2, so both
> versions were tested using the exact same data files on HDFS. I also used
> the 1.6 version of PE for both, so the only change is on the server (1.6 vs
> 2.2 masters and regionservers).
> >
> > It appears a refactor to ScanQueryMatcher and friends has disabled the
> ability of filters to provide SKIP hints, which prevents us from bypassing
> version checking (so some extra cost in SQM), and appears to disable an
> optimization that avoids reseeking, leading to a serious and proportional
> regression in reseek activity and time spent in that code path. So for
> queries that use filters, there can be a substantial regression.
> >
> > Other test cases that did not use filters did not show a regression.
> >
> > A test case where I used ROW_INDEX_V1 encoding showed an expected modest
> proportional regression in seeking time, due to the fact it is optimized
> for point queries and not optimized for the full table scan case.
> >
> > I will come back here when I understand this better.
> >
> > Here are the results for the pe --filterAll case:
> >
> >
> > 1.6.0 c1      2.2.5 c1
> > 1.6.0 c5      2.2.5 c5
> > 1.6.0 c10     2.2.5 c10
> > 1.6.0 c20     2.2.5 c20
> > 1.6.0 c50     2.2.5 c50
> > 1.6.0 c100    2.2.5 c100
> > Counts
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >     (better heartbeating)
> >     (better heartbeating)
> >     (better heartbeating)
> >     (better heartbeating)
> >     (better heartbeating)
> > rpcs  1       2       200%    2       6       300%    2       10
> 500%    3       17      567%    4       37      925%    8       72      900%
> > block_reads   11507   11508   100%    57255   57257   100%    114471
> 114474  100%    230372  230377  100%    578292  578298  100%    1157955
> 1157963 100%
> > block_unpacks 11507   11508   100%    57255   57257   100%    114471
> 114474  100%    230372  230377  100%    578292  578298  100%    1157955
> 1157963 100%
> > seeker_next   10000000        10000000        100%    50000000
> 50000000        100%    100000000       100000000       100%    200000000
>      200000000       100%    500000000       500000000       100%
> 1000000000      1000000000      100%
> > store_next    10000000        9988268 100%    50000000        49940082
>       100%    100000000       99879401        100%    200000000
>  199766539       100%    500000000       499414653       100%
> 1000000000      998836518       100%
> > store_reseek  1       11733   > !     2       59924   > !     8
>  120607  > !     6       233467  > !     10      585357  > !     8
>  1163490 > !
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > cells_matched 20000000        20000000        100%    60000000
> 60000000        100%    110000000       110000000       100%    210000000
>      210000000       100%    510000000       510000000       100%
> 1010000000      1010000000      100%
> > column_hint_include   10000000        10000000        100%    50000000
>       50000000        100%    100000000       100000000       100%
> 200000000       200000000       100%    500000000       500000000
>  100%    1000000000      1000000000      100%
> > filter_hint_skip      10000000        10000000        100%    50000000
>       50000000        100%    100000000       100000000       100%
> 200000000       200000000       100%    500000000       500000000
>  100%    1000000000      1000000000      100%
> > sqm_hint_done 9999999 9999999 100%    9999998 9999998 100%    9999998
> 9999998 100%    9999997 9999997 100%    9999996 9999996 100%    9999992
> 9999992 100%
> > sqm_hint_seek_next_col        1       0       0%      2       40000002
>       > !     2       90000002        > !     3       190000003       > !
>    4       490000004       > !     8       990000008       > !
> > sqm_hint_seek_next_row        0       10000000        > !     2
>  10000000        > !     0       10000000        > !     0       10000000
>       > !     0       10000000        > !     0       10000000        > !
> > sqm_hint_skip 10000000        0       0%      50000000        0
>  0%      100000000       0       0%      200000000       0       0%
> 500000000       0       0%      1000000000      0       0%
> > versions_hint_include_and_seek_next_col       0       0       0%      0
>      40000000        > !     0       90000000        > !     0
>  190000000       > !     0       490000000       > !     0       990000000
>      > !
> > versions_hint_include_and_seek_next_row       0       0       0%      0
>      10000000        > !     0       10000000        > !     0
>  10000000        > !     0       10000000        > !     0       10000000
>       > !
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Times
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > block_read_ms 174.12  215.96  124%    683.78  883.02  129%    1724.66
> 1937.87 112%    3199.93 3470.92 108%    7582.96 7941.45 105%    14701.52
>     16022.57        109%
> > block_unpack_ms       0.63    0.62    99%     2.35    2.35    100%
> 4.36    4.56    105%    9.10    8.30    91%     21.59   22.01   102%
> 40.97   40.93   100%
> > seeker_next_ms        1073.45 1053.51 98%     4452.52 4728.96 106%
> 9359.61 9706.06 104%    18879.28        18330.93        97%     47664.95
>     46404.26        97%     96511.83        92121.30        95%
> > store_next_ms 2651.11 2423.27 91%     12048.64        11398.63
> 95%     24833.68        23586.20        95%     48493.26        46574.38
>     96%     120786.00       115603.00       96%     240627.00
>  227945.00       95%
> > store_reseek_ms       2.71    62.17   2297%   10.93   233.67  2139%
>  10.88   401.88  3693%   13.98   783.18  5601%   22.26   1939.28 8714%
>  24.32   4065.25 16714%
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Rows=10000000
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > ValueSize=20
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > BlockEncoding=FAST_DIFF
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Compress=NONE
> > Filter=ALL
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Jun 11, 2020 at 5:08 PM Andrew Purtell <apurt...@apache.org>
> wrote:
> >> I used PE to generate 10M row tables with one family with either 1, 10,
> 20, 50, or 100 values per row (unique column-qualifiers). An increase in
> wall clock time was noticeable, for example:
> >>
> >> 1.6.0
> >>
> >> time ./bin/hbase pe --rows=5000000 --table=TestTable_f1_c20
> --columns=20 --nomapred scan 2
> >> real 1m20.179s
> >> user 0m45.470s
> >> sys 0m16.572s
> >>
> >> 2.2.5
> >>
> >> time ./bin/hbase pe --rows=5000000 --table=TestTable_f1_c20
> --columns=20 --nomapred scan 2
> >> real 1m31.016s
> >> user 0m48.658s
> >> sys 0m16.965s
> >>
> >> It didnt really make a difference if I used 1 thread or 4 or 10, the
> delta was about the same, proportionally. I picked two threads in the end
> so I'd have enough time to launch async-profiler twice in another shell, to
> capture flamegraph and call tree output, respectively. async-profiler
> captured 10 seconds at steady state per test case. Upon first inspection
> what jumps out is an increasing proportion of CPU time spent in GC in 2.2.5
> vs 1.6.0. The difference increases as the number of column families
> increase. There is little apparent difference at 1 column family, but a 2x
> or more difference at 20 columns, and a 10x or more difference at 100
> columns, eyeballing the charts, flipping back and forth between browser
> windows. This seems more than coincidental but obviously calls for capture
> and analysis of GC trace, with JFR. Will do that next.
> >>
> >> JVM: openjdk version "1.8.0_232" OpenJDK Runtime Environment (Zulu
> 8.42.0.21-CA-macosx) (build 1.8.0_232-b18) OpenJDK 64-Bit Server VM (Zulu
> 8.42.0.21-CA-macosx) (build 25.232-b18, mixed mode)
> >>
> >> Regionserver JVM flags: -Xms10g -Xmx10g -XX:+UseG1GC
> -XX:+AlwaysPreTouch -XX:+UseNUMA -XX:-UseBiasedLocking
> -XX:+ParallelRefProcEnabled
> >>
> >>
> >> On Thu, Jun 11, 2020 at 7:06 AM Jan Van Besien <ja...@ngdata.com>
> wrote:
> >>> This is promising, thanks a lot. Testing with hbase 2.2.5 shows an
> >>> improvement, but we're not there yet.
> >>>
> >>> As reported earlier, hbase 2.1.0 was about 60% slower than hbase 1.2.0
> >>> in a test that simply scans all the regions in parallel without any
> >>> filter. A test with hbase 2.2.5 shows it to be about 40% slower than
> >>> 1.2.0. So that is better than 2.1.0, but still substantially slower
> >>> than what hbase 1.2.0 was.
> >>>
> >>> As before, I tested this both on a 3 node cluster as well as with a
> >>> unit test using HBaseTestingUtility. Both tests show very similar
> >>> relative differences.
> >>>
> >>> Jan
> >>>
> >>> On Thu, Jun 11, 2020 at 2:16 PM Anoop John <anoop.hb...@gmail.com>
> wrote:
> >>> >
> >>> > In another mail thread Zheng Hu brought up an important Jra fix
> >>> > https://issues.apache.org/jira/browse/HBASE-21657
> >>> > Can u pls check with this once?
> >>> >
> >>> > Anoop
> >>> >
> >>> >
> >>> > On Tue, Jun 9, 2020 at 8:08 PM Jan Van Besien <ja...@ngdata.com>
> wrote:
> >>> >
> >>> > > On Sun, Jun 7, 2020 at 7:49 AM Anoop John <anoop.hb...@gmail.com>
> wrote:
> >>> > > > As per the above configs, it looks like Bucket Cache is not
> being used.
> >>> > > > Only on heap LRU cache in use.
> >>> > >
> >>> > > True (but it is large enough to hold everything, so I don't think
> it
> >>> > > matters).
> >>> > >
> >>> > > > @Jan - Is it possible for you to test with off heap Bucket Cache?
> >>> > >  Config
> >>> > > > bucket cache off heap mode with size ~7.5 GB
> >>> > >
> >>> > > I did a quick test but it seems not to make a measurable
> difference.
> >>> > > If anything, it is actually slightly slower even. I see 100% hit
> ratio
> >>> > > in the L1
> >>> > > LruBlockCache and effectively also 100% in the L2 BucketCache (hit
> >>> > > ratio is not yet at 100% but hits increase with every test and
> misses
> >>> > > do not).
> >>> > >
> >>> > > Given that the LruBlockCache was already large enough to cache all
> the
> >>> > > data anyway, I did not expect this to help either, to be honest.
> >>> > >
> >>> > > > Do you have any DataBlockEncoding enabled on the CF?
> >>> > >
> >>> > > Yes, FAST_DIFF. But this is of course true in both the tests with
> >>> > > hbase2 and hbase1.
> >>> > >
> >>> > > Jan
> >>> > >
> >>
> >>
> >> --
> >> Best regards,
> >> Andrew
> >>
> >> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
> >>    - A23, Crosstalk
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
> >    - A23, Crosstalk
>

Reply via email to