[
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17221309#comment-17221309
]
ramkrishna.s.vasudevan commented on HBASE-24637:
------------------------------------------------
[~larsh] and [~apurtell]
I have been trying to reproduce this issue and finally was able to reproduce
this. It is very clear and evident that the full scans we are able to reproduce
this issue when ever we have addColumns to it and that those addColumns cover
almost majority of the columns in the given row. Why because say if we have
only 3 columns (random) to be covered out of 25 columns then we don't see much
of an impact. But assume we need to cover >20 cols then this is much
pronounced. Particularly with PE if we have 25 cols then by default all columns
get covered by using addColumn(). If that was not the case then this perf issue
is not visible.
Now coming to the issue as [~larsh] rightly pointed out it is the SQM that is
now saying to SEEK_NEXT_COL when the filter says SKIP and tracker says SEEK
then we end up in this issue. But at the StoreScanner (as per my observation)
it is not the reseek that is actually causing the issue. the reason is we don't
actually reseek but we tend to do more comparisons in the case where we have
tryskipOrSeekToNextcolumn(). In prevoius 1.x branches we directly got a SKIP
and so we just did a skip but here we are deciding whether to skip or Seek and
there we spend more time. In a table with 100000 rows and 20 columns (all added
as part of addColumn) we doing approx 100000*20 more compares. There is no seek
happening at all. This is out of a simple test case running from mini dfs
cluster and just adding filterAll and adding all the 20 columns as part of the
scan. (all data in cache, versions 1).
> Reseek regression related to filter SKIP hinting
> ------------------------------------------------
>
> Key: HBASE-24637
> URL: https://issues.apache.org/jira/browse/HBASE-24637
> Project: HBase
> Issue Type: Bug
> Components: Filters, Performance, Scanners
> Affects Versions: 2.2.5
> Reporter: Andrew Kyle Purtell
> Priority: Major
> Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf,
> W-7665966-Instrument-low-level-scan-details-branch-1.patch,
> W-7665966-Instrument-low-level-scan-details-branch-2.2.patch,
> parse_call_trace.pl
>
>
> I have been looking into reported performance regressions in HBase 2 relative
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate
> significantly better microbenchmarks in a number of cases, and usually shows
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call
> metrics that leverage the fact it puts a reference to the current Call into a
> thread local and that all activity for a given RPC is processed by a single
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock,
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6
> and 2.2 versions under test operated on identical data files in HDFS. For
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached.
> It appears a refactor to ScanQueryMatcher and friends has disabled the
> ability of filters to provide meaningful SKIP hints, which disables an
> optimization that avoids reseeking, leading to a serious and proportional
> regression in reseek activity and time spent in that code path. So for
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was
> almost identical, as measured by counts of the hint types returned, whether
> or not column or version trackers are called, and counts of store seeks or
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and
> results generally fell within this range, except for the filter all case of
> course.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)