[jira] [Updated] (HBASE-20628) SegmentScanner does over-comparing when one Segment only

stack (JIRA) Wed, 23 May 2018 10:31:22 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-20628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


stack updated HBASE-20628:
--------------------------
    Description: 
Flushing memstore is taking too long. It looks like we are doing a bunch of 
comparing out of a new facility in hbase2, the Segment scanner at flush time.

Below is a patch from [~anoop.hbase]. I had a similar more hacky version. Both 
undo the extra comparing we were seeing in perf tests.

[~anastas] and [~eshcar]. Need your help please.

As I read it, we are trying to flush the memstore snapshot (default, no IMC 
case). There is only ever going to be one Segment involved (even if IMC is 
enabled); the snapshot Segment. But the getScanners is returning a list (of 
one)  Scanners and the scan is via the generic SegmentScanner which is all 
about a bunch of stuff we don't need when doing a flush so it seems to do more 
work than is necessary. It also supports scanning backwards which is not needed 
when trying to flush memstore.

Do you see a problem doing a version of Anoops patch (whether IMC or not)? It 
makes a big difference in general throughput when the below patch is in place. 
Thanks.


{code}
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
index cbd60e5da3..c3dd972254 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
@@ -40,7 +40,8 @@ public class MemStoreSnapshot implements Closeable {
     this.cellsCount = snapshot.getCellsCount();
     this.memStoreSize = snapshot.getMemStoreSize();
     this.timeRangeTracker = snapshot.getTimeRangeTracker();
-    this.scanners = snapshot.getScanners(Long.MAX_VALUE, Long.MAX_VALUE);
+    //this.scanners = snapshot.getScanners(Long.MAX_VALUE, Long.MAX_VALUE);
+    this.scanners = snapshot.getScannersForSnapshot();
     this.tagsPresent = snapshot.isTagsPresent();
   }

diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
index 70074bf3b4..279c4e50c8 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
@@ -33,6 +33,7 @@ import org.apache.hadoop.hbase.KeyValueUtil;
 import org.apache.hadoop.hbase.io.TimeRange;
 import org.apache.hadoop.hbase.util.Bytes;
 import org.apache.hadoop.hbase.util.ClassSize;
+import org.apache.hadoop.hbase.util.CollectionBackedScanner;
 import org.apache.yetus.audience.InterfaceAudience;
 import org.slf4j.Logger;
 import 
org.apache.hbase.thirdparty.com.google.common.annotations.VisibleForTesting;
@@ -130,6 +131,10 @@ public abstract class Segment {
     return Collections.singletonList(new SegmentScanner(this, readPoint, 
order));
   }

+  public List<KeyValueScanner> getScannersForSnapshot() {
+    return Collections.singletonList(new 
CollectionBackedScanner(this.cellSet.get(), comparator));
+  }
+
   /**
    * @return whether the segment has any cells
    */
{code}



  was:
Flushing memstore is taking too long. It looks like we are doing a bunch of 
comparing out of a new facility in hbase2, the Segment scanner at flush time.

Below is a patch from [~anoop.hbase]. I had a similar more hacky version. Both 
undo the extra comparing we were seeing in per tests.

[~anastas] and [~eshcar]. Need your help please.

As I read it, we are trying to flush the memstore snapshot (default, no IMC 
case). There is only ever going to be one Segment involved (even if IMC is 
enabled); the snapshot Segment. But the getScanners is returning a list of 
Scanners and the scan is via the generic SegmentScanner which is all about 
Scanning over multiple Segments of a pipeline and so seems to do more work than 
is necessary in this case at least. It also supports scanning backwards which 
is not needed when trying to flush memstore.

Do you see a problem doing a version of Anoops patch (whether IMC or not)? It 
makes a big difference in general throughput when the below patch is in place. 
Thanks.


{code}
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
index cbd60e5da3..c3dd972254 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
@@ -40,7 +40,8 @@ public class MemStoreSnapshot implements Closeable {
     this.cellsCount = snapshot.getCellsCount();
     this.memStoreSize = snapshot.getMemStoreSize();
     this.timeRangeTracker = snapshot.getTimeRangeTracker();
-    this.scanners = snapshot.getScanners(Long.MAX_VALUE, Long.MAX_VALUE);
+    //this.scanners = snapshot.getScanners(Long.MAX_VALUE, Long.MAX_VALUE);
+    this.scanners = snapshot.getScannersForSnapshot();
     this.tagsPresent = snapshot.isTagsPresent();
   }

diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
index 70074bf3b4..279c4e50c8 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
@@ -33,6 +33,7 @@ import org.apache.hadoop.hbase.KeyValueUtil;
 import org.apache.hadoop.hbase.io.TimeRange;
 import org.apache.hadoop.hbase.util.Bytes;
 import org.apache.hadoop.hbase.util.ClassSize;
+import org.apache.hadoop.hbase.util.CollectionBackedScanner;
 import org.apache.yetus.audience.InterfaceAudience;
 import org.slf4j.Logger;
 import 
org.apache.hbase.thirdparty.com.google.common.annotations.VisibleForTesting;
@@ -130,6 +131,10 @@ public abstract class Segment {
     return Collections.singletonList(new SegmentScanner(this, readPoint, 
order));
   }

+  public List<KeyValueScanner> getScannersForSnapshot() {
+    return Collections.singletonList(new 
CollectionBackedScanner(this.cellSet.get(), comparator));
+  }
+
   /**
    * @return whether the segment has any cells
    */
{code}




> SegmentScanner does over-comparing when one Segment only
> --------------------------------------------------------
>
>                 Key: HBASE-20628
>                 URL: https://issues.apache.org/jira/browse/HBASE-20628
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Performance
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>             Fix For: 2.0.1
>
>
> Flushing memstore is taking too long. It looks like we are doing a bunch of 
> comparing out of a new facility in hbase2, the Segment scanner at flush time.
> Below is a patch from [~anoop.hbase]. I had a similar more hacky version. 
> Both undo the extra comparing we were seeing in perf tests.
> [~anastas] and [~eshcar]. Need your help please.
> As I read it, we are trying to flush the memstore snapshot (default, no IMC 
> case). There is only ever going to be one Segment involved (even if IMC is 
> enabled); the snapshot Segment. But the getScanners is returning a list (of 
> one)  Scanners and the scan is via the generic SegmentScanner which is all 
> about a bunch of stuff we don't need when doing a flush so it seems to do 
> more work than is necessary. It also supports scanning backwards which is not 
> needed when trying to flush memstore.
> Do you see a problem doing a version of Anoops patch (whether IMC or not)? It 
> makes a big difference in general throughput when the below patch is in 
> place. Thanks.
> {code}
> diff --git 
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
>  
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
> index cbd60e5da3..c3dd972254 100644
> --- 
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
> +++ 
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java
> @@ -40,7 +40,8 @@ public class MemStoreSnapshot implements Closeable {
>      this.cellsCount = snapshot.getCellsCount();
>      this.memStoreSize = snapshot.getMemStoreSize();
>      this.timeRangeTracker = snapshot.getTimeRangeTracker();
> -    this.scanners = snapshot.getScanners(Long.MAX_VALUE, Long.MAX_VALUE);
> +    //this.scanners = snapshot.getScanners(Long.MAX_VALUE, Long.MAX_VALUE);
> +    this.scanners = snapshot.getScannersForSnapshot();
>      this.tagsPresent = snapshot.isTagsPresent();
>    }
> diff --git 
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
>  
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
> index 70074bf3b4..279c4e50c8 100644
> --- 
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
> +++ 
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
> @@ -33,6 +33,7 @@ import org.apache.hadoop.hbase.KeyValueUtil;
>  import org.apache.hadoop.hbase.io.TimeRange;
>  import org.apache.hadoop.hbase.util.Bytes;
>  import org.apache.hadoop.hbase.util.ClassSize;
> +import org.apache.hadoop.hbase.util.CollectionBackedScanner;
>  import org.apache.yetus.audience.InterfaceAudience;
>  import org.slf4j.Logger;
>  import 
> org.apache.hbase.thirdparty.com.google.common.annotations.VisibleForTesting;
> @@ -130,6 +131,10 @@ public abstract class Segment {
>      return Collections.singletonList(new SegmentScanner(this, readPoint, 
> order));
>    }
> +  public List<KeyValueScanner> getScannersForSnapshot() {
> +    return Collections.singletonList(new 
> CollectionBackedScanner(this.cellSet.get(), comparator));
> +  }
> +
>    /**
>     * @return whether the segment has any cells
>     */
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20628) SegmentScanner does over-comparing when one Segment only

Reply via email to