[
https://issues.apache.org/jira/browse/HBASE-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641597#action_12641597
]
stack commented on HBASE-946:
-----------------------------
Here is code I added to HStore to do store-scoped iteration and compactions:
{code}
Index: src/java/org/apache/hadoop/hbase/regionserver/HStore.java
===================================================================
--- src/java/org/apache/hadoop/hbase/regionserver/HStore.java (revision
706719)
+++ src/java/org/apache/hadoop/hbase/regionserver/HStore.java (working copy)
@@ -47,6 +47,7 @@
import org.apache.hadoop.hbase.HConstants;
import org.apache.hadoop.hbase.HRegionInfo;
import org.apache.hadoop.hbase.HStoreKey;
+import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.RemoteExceptionHandler;
import org.apache.hadoop.hbase.filter.RowFilterInterface;
import org.apache.hadoop.hbase.io.Cell;
@@ -1977,4 +1978,40 @@
HRegionInfo getHRegionInfo() {
return this.info;
}
+
+ public static void main(String [] args) throws IOException {
+ Path storedir = new Path("/tmp/streams");
+ HTableDescriptor htd = new HTableDescriptor("streams");
+ HColumnDescriptor hcd = new HColumnDescriptor("items:");
+ htd.addFamily(hcd);
+ htd.addFamily(new HColumnDescriptor("content:"));
+ HRegionInfo hri = new HRegionInfo(htd, HConstants.EMPTY_BYTE_ARRAY,
+ HConstants.EMPTY_BYTE_ARRAY, false, 1);
+ long startTime = System.currentTimeMillis();
+ HBaseConfiguration c = new HBaseConfiguration();
+ HStore store = new HStore(storedir, hri, hcd, FileSystem.getLocal(c), null,
+ c, null);
+ store.compact(true);
+ HStoreKey hsk = new HStoreKey();
+ SortedMap<byte [], Cell> value =
+ new TreeMap<byte [], Cell>(Bytes.BYTES_COMPARATOR);
+ byte [][] columns = new byte [1][];
+ columns[0] = Bytes.toBytes("items:.*");
+ InternalScanner s = store.getScanner(HConstants.LATEST_TIMESTAMP, columns,
+ HConstants.EMPTY_BYTE_ARRAY, null);
+ long rows = 0;
+ long values = 0;
+ try {
+ while(s.next(hsk, value)) {
+ rows++;
+ LOG.info("" + rows + ": " + hsk + " " + value.size());
+ values += value.size();
+ value.clear();
+ }
+ LOG.info("TIME: " + (System.currentTimeMillis() - startTime) + " rows="
+ rows + ", values=" + values);
+ } finally {
+ s.close();
+ store.close();
+ }
+ }
}
{code}
Should have able to do it jruby. Didn't try.
> Row with 55k deletes timesout scanner lease
> -------------------------------------------
>
> Key: HBASE-946
> URL: https://issues.apache.org/jira/browse/HBASE-946
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Priority: Blocker
> Fix For: 0.18.1, 0.19.0
>
>
> Made a blocker because it was found by Jon Gray (smile)
> So, Jon Gray has a row with 55k deletes all in the same row. When he tries
> to scan, his scanner timesout when it gets to this row. The root cause is
> the mechanism we use to make sure a delete in a new store file overshadows an
> entry at same address in an old file. We accumulate a List of all deletes
> encountered. Before adding a delete to the List, we check if already a
> deleted. This check is whats killing us. One issue is that its doing super
> inefficient check of whether table is root but even fixing this inefficency
> -- and then removing the check for root since its redundant we're still too
> slow.
> Chatting with Jim K, he suggested that ArrayList check is linear. Changing
> the aggregation of deletes to instead use HashSet makes all run an order of
> magnitude faster.
> Also part of this issue, need to figure why on compaction we are not letting
> go of these deletes.
> Filing this issue against 0.18.1 so it gets into the RC2 (after chatting w/
> J-D and JK -- J-D is seeing the issue also).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.