[jira] [Commented] (HBASE-20628) SegmentScanner does over-comparing when one flushing

stack (JIRA) Sun, 27 May 2018 14:36:41 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-20628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492166#comment-16492166
 ]


stack commented on HBASE-20628:
-------------------------------

[~eshcar] Thanks for repaste. Yeah, I missed it. My fault (I got the comment in 
RB... thanks). Comments below.

bq. I don't think it is related to the fact that SegmentScanner supports 
backward scans because this code is simply not utilised in the case of flush.

Agree.

bq. ....could this be done in a more efficient way

Does it have to be created each time? Fguring where in the CSLM to start 
returning values is perhaps source of added compare load seen profiling; each 
time we run through it anew.

bq. ...could it be that we ask to reseek for a cell that precedes the current 
cell during a flush? 

During a flush, I cannot think of why we'd ever do other than a straightline 
read through all in snapshot. Should we add an assert that throws if we ever do 
otherwise? I could run it then w/ various loadings to see if it trips?

bq. Why even call reseek(cell) and not only next() during a flush?

Do we? I've not dug in.

So, you think the patch in description the way to go? I was thinking it arcane 
-- why would you want a special scanner for flushing -- but you think this 
worth special-casing? If so, no problem... I can put up a patch.

Thanks for input.


> SegmentScanner does over-comparing when one flushing
> ----------------------------------------------------
>
>                 Key: HBASE-20628
>                 URL: https://issues.apache.org/jira/browse/HBASE-20628
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Performance
>            Reporter: stack
>            Priority: Critical
>             Fix For: 2.0.1
>
>         Attachments: HBASE-20628.branch-2.0.001 (1).patch, 
> HBASE-20628.branch-2.0.001.patch, HBASE-20628.branch-2.0.001.patch, 
> HBASE-20628.branch-2.0.002.patch, Screen Shot 2018-05-25 at 9.38.00 AM.png, 
> hits-20628.png
>
>
> Flushing memstore is taking too long. It looks like we are doing a bunch of 
> comparing out of a new facility in hbase2, the Segment scanner at flush time.
> Below is a patch from [~anoop.hbase]. I had a similar more hacky version. 
> Both undo the extra comparing we were seeing in perf tests.
> [~anastas] and [~eshcar]. Need your help please.
> As I read it, we are trying to flush the memstore snapshot (default, no IMC 
> case). There is only ever going to be one Segment involved (even if IMC is 
> enabled); the snapshot Segment. But the getScanners is returning a list (of 
> one)  Scanners and the scan is via the generic SegmentScanner which is all 
> about a bunch of stuff we don't need when doing a flush so it seems to do 
> more work than is necessary. It also supports scanning backwards which is not 
> needed when trying to flush memstore.
> Do you see a problem doing a version of Anoops patch (whether IMC or not)? It 
> makes a big difference in general throughput when the below patch is in 
> place. Thanks.
> {code}
> diff --git 
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
>  
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
> index cbd60e5da3..c3dd972254 100644
> --- 
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
> +++ 
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
> @@ -40,7 +40,8 @@ public class MemStoreSnapshot implements Closeable {
>      this.cellsCount = snapshot.getCellsCount();
>      this.memStoreSize = snapshot.getMemStoreSize();
>      this.timeRangeTracker = snapshot.getTimeRangeTracker();
> -    this.scanners = snapshot.getScanners(Long.MAX_VALUE, Long.MAX_VALUE);
> +    //this.scanners = snapshot.getScanners(Long.MAX_VALUE, Long.MAX_VALUE);
> +    this.scanners = snapshot.getScannersForSnapshot();
>      this.tagsPresent = snapshot.isTagsPresent();
>    }
> diff --git 
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
>  
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
> index 70074bf3b4..279c4e50c8 100644
> --- 
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
> +++ 
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
> @@ -33,6 +33,7 @@ import org.apache.hadoop.hbase.KeyValueUtil;
>  import org.apache.hadoop.hbase.io.TimeRange;
>  import org.apache.hadoop.hbase.util.Bytes;
>  import org.apache.hadoop.hbase.util.ClassSize;
> +import org.apache.hadoop.hbase.util.CollectionBackedScanner;
>  import org.apache.yetus.audience.InterfaceAudience;
>  import org.slf4j.Logger;
>  import 
> org.apache.hbase.thirdparty.com.google.common.annotations.VisibleForTesting;
> @@ -130,6 +131,10 @@ public abstract class Segment {
>      return Collections.singletonList(new SegmentScanner(this, readPoint, 
> order));
>    }
> +  public List<KeyValueScanner> getScannersForSnapshot() {
> +    return Collections.singletonList(new 
> CollectionBackedScanner(this.cellSet.get(), comparator));
> +  }
> +
>    /**
>     * @return whether the segment has any cells
>     */
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20628) SegmentScanner does over-comparing when one flushing

Reply via email to