[jira] Commented: (HBASE-946) Row with 55k deletes timesout scanner lease

stack (JIRA) Tue, 21 Oct 2008 14:15:49 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641597#action_12641597
 ]


stack commented on HBASE-946:
-----------------------------

Here is code I added to HStore to do store-scoped iteration and compactions:

{code}
Index: src/java/org/apache/hadoop/hbase/regionserver/HStore.java
===================================================================
--- src/java/org/apache/hadoop/hbase/regionserver/HStore.java   (revision 
706719)
+++ src/java/org/apache/hadoop/hbase/regionserver/HStore.java   (working copy)
@@ -47,6 +47,7 @@
 import org.apache.hadoop.hbase.HConstants;
 import org.apache.hadoop.hbase.HRegionInfo;
 import org.apache.hadoop.hbase.HStoreKey;
+import org.apache.hadoop.hbase.HTableDescriptor;
 import org.apache.hadoop.hbase.RemoteExceptionHandler;
 import org.apache.hadoop.hbase.filter.RowFilterInterface;
 import org.apache.hadoop.hbase.io.Cell;
@@ -1977,4 +1978,40 @@
   HRegionInfo getHRegionInfo() {
     return this.info; 
   }         
+
+  public static void main(String [] args) throws IOException {
+    Path storedir = new Path("/tmp/streams");
+    HTableDescriptor htd = new HTableDescriptor("streams");
+    HColumnDescriptor hcd = new HColumnDescriptor("items:");
+    htd.addFamily(hcd);
+    htd.addFamily(new HColumnDescriptor("content:"));
+    HRegionInfo hri = new HRegionInfo(htd, HConstants.EMPTY_BYTE_ARRAY,
+      HConstants.EMPTY_BYTE_ARRAY, false, 1);
+    long startTime = System.currentTimeMillis();
+    HBaseConfiguration c = new HBaseConfiguration();
+    HStore store = new HStore(storedir, hri, hcd, FileSystem.getLocal(c), null,
+      c, null);
+    store.compact(true);
+    HStoreKey hsk = new HStoreKey();
+    SortedMap<byte [], Cell> value =
+      new TreeMap<byte [], Cell>(Bytes.BYTES_COMPARATOR);
+    byte [][] columns = new byte [1][];
+    columns[0] = Bytes.toBytes("items:.*");
+    InternalScanner s = store.getScanner(HConstants.LATEST_TIMESTAMP, columns,
+      HConstants.EMPTY_BYTE_ARRAY, null);
+    long rows = 0;
+    long values = 0;
+    try {
+      while(s.next(hsk, value)) {
+        rows++;
+        LOG.info("" + rows + ": " + hsk + " " + value.size());
+        values += value.size();
+        value.clear();
+      }
+      LOG.info("TIME: " + (System.currentTimeMillis() - startTime) + " rows=" 
+ rows + ", values=" + values);
+    } finally {
+      s.close();
+      store.close();
+    }
+  }
 }
{code}

Should have able to do it jruby.  Didn't try. 

> Row with 55k deletes timesout scanner lease
> -------------------------------------------
>
>                 Key: HBASE-946
>                 URL: https://issues.apache.org/jira/browse/HBASE-946
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.18.1, 0.19.0
>
>
> Made a blocker because it was found by Jon Gray (smile)
> So, Jon Gray has a row with 55k deletes all in the same row.  When he tries 
> to scan, his scanner timesout when it gets to this row.  The root cause is 
> the mechanism we use to make sure a delete in a new store file overshadows an 
> entry at same address in an old file.   We accumulate a List of all deletes 
> encountered.  Before adding a delete to the List, we check if already a 
> deleted.  This check is whats killing us.  One issue is that its doing super 
> inefficient check of whether table is root but even fixing this inefficency 
> -- and then removing the check for root since its redundant we're still too 
> slow.
> Chatting with Jim K, he suggested that ArrayList check is linear.  Changing 
> the aggregation of deletes to instead use HashSet makes all run an order of 
> magnitude faster.
> Also part of this issue, need to figure why on compaction we are not letting 
> go of these deletes.
> Filing this issue against 0.18.1 so it gets into the RC2 (after chatting w/ 
> J-D and JK -- J-D is seeing the issue also).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-946) Row with 55k deletes timesout scanner lease

Reply via email to