[
https://issues.apache.org/jira/browse/HBASE-20628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494693#comment-16494693
]
Anoop Sam John commented on HBASE-20628:
----------------------------------------
In fact I am having a patch for a custom scanner and SQM for flushes. I
totally forgot the details of that.. Need to revisit that.. IMHO, we can
commit this patch.
bq.replace the implementation of SegmentScanner::reseek(cell) with the code
from CollectionBackedScanner::reseek(cell) which does not create a new iterator
and simply runs next until reaching the cell.
I tried this once but not got perf improvement as with CollectionBackedScanner.
SegmentScanner::reseek - Lets work on that on another issue.
> SegmentScanner does over-comparing when one flushing
> ----------------------------------------------------
>
> Key: HBASE-20628
> URL: https://issues.apache.org/jira/browse/HBASE-20628
> Project: HBase
> Issue Type: Sub-task
> Components: Performance
> Reporter: stack
> Priority: Critical
> Fix For: 2.0.1
>
> Attachments: HBASE-20628.branch-2.0.001 (1).patch,
> HBASE-20628.branch-2.0.001.patch, HBASE-20628.branch-2.0.001.patch,
> HBASE-20628.branch-2.0.002.patch, Screen Shot 2018-05-25 at 9.38.00 AM.png,
> hits-20628.png
>
>
> Flushing memstore is taking too long. It looks like we are doing a bunch of
> comparing out of a new facility in hbase2, the Segment scanner at flush time.
> Below is a patch from [~anoop.hbase]. I had a similar more hacky version.
> Both undo the extra comparing we were seeing in perf tests.
> [~anastas] and [~eshcar]. Need your help please.
> As I read it, we are trying to flush the memstore snapshot (default, no IMC
> case). There is only ever going to be one Segment involved (even if IMC is
> enabled); the snapshot Segment. But the getScanners is returning a list (of
> one) Scanners and the scan is via the generic SegmentScanner which is all
> about a bunch of stuff we don't need when doing a flush so it seems to do
> more work than is necessary. It also supports scanning backwards which is not
> needed when trying to flush memstore.
> Do you see a problem doing a version of Anoops patch (whether IMC or not)? It
> makes a big difference in general throughput when the below patch is in
> place. Thanks.
> {code}
> diff --git
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
>
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
> index cbd60e5da3..c3dd972254 100644
> ---
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
> +++
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
> @@ -40,7 +40,8 @@ public class MemStoreSnapshot implements Closeable {
> this.cellsCount = snapshot.getCellsCount();
> this.memStoreSize = snapshot.getMemStoreSize();
> this.timeRangeTracker = snapshot.getTimeRangeTracker();
> - this.scanners = snapshot.getScanners(Long.MAX_VALUE, Long.MAX_VALUE);
> + //this.scanners = snapshot.getScanners(Long.MAX_VALUE, Long.MAX_VALUE);
> + this.scanners = snapshot.getScannersForSnapshot();
> this.tagsPresent = snapshot.isTagsPresent();
> }
> diff --git
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
>
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
> index 70074bf3b4..279c4e50c8 100644
> ---
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
> +++
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
> @@ -33,6 +33,7 @@ import org.apache.hadoop.hbase.KeyValueUtil;
> import org.apache.hadoop.hbase.io.TimeRange;
> import org.apache.hadoop.hbase.util.Bytes;
> import org.apache.hadoop.hbase.util.ClassSize;
> +import org.apache.hadoop.hbase.util.CollectionBackedScanner;
> import org.apache.yetus.audience.InterfaceAudience;
> import org.slf4j.Logger;
> import
> org.apache.hbase.thirdparty.com.google.common.annotations.VisibleForTesting;
> @@ -130,6 +131,10 @@ public abstract class Segment {
> return Collections.singletonList(new SegmentScanner(this, readPoint,
> order));
> }
> + public List<KeyValueScanner> getScannersForSnapshot() {
> + return Collections.singletonList(new
> CollectionBackedScanner(this.cellSet.get(), comparator));
> + }
> +
> /**
> * @return whether the segment has any cells
> */
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)