[
https://issues.apache.org/jira/browse/HBASE-14736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988294#comment-14988294
]
stack commented on HBASE-14736:
-------------------------------
I tried just getting first 1k in each partition. Was then having interesting
issue where WALs were moving out from under me while the task ran... they were
no longer found. I got the first-1k job to pass but it found no keys.... Need
to dig in more. Seems like when the scale is large, these tools as they are
written no longer work (I lost use of cluster so could pursue no further for
the time being).
> ITBLL debugging search tool OOMEs on big dataset
> ------------------------------------------------
>
> Key: HBASE-14736
> URL: https://issues.apache.org/jira/browse/HBASE-14736
> Project: HBase
> Issue Type: Bug
> Reporter: stack
>
> I ran an ITBLL on an 80 node cluster sized to do 100B items. The job failed
> with 300M undefined items (branch-1). I tried to run the search tool
> debugging the loss -- see
> https://docs.google.com/document/d/14Tvu5yWYNBDFkh8xCqLkU9tlyNWhJv3GjDGOkqZU1eE/edit#
> -- but it OOME'd:
> {code}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:834)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:697)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at
> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:2452)
> at
> org.apache.hadoop.mapreduce.lib.input.SequenceFileAsBinaryInputFormat$SequenceFileAsBinaryRecordReader.nextKeyValue(SequenceFileAsBinaryInputFormat.java:119)
> at
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Search.readFileToSearch(IntegrationTestBigLinkedList.java:775)
> at
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Search.readKeysToSearch(IntegrationTestBigLinkedList.java:757)
> at
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Search.run(IntegrationTestBigLinkedList.java:726)
> at
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Search.run(IntegrationTestBigLinkedList.java:657)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList.runTestFromCommandLine(IntegrationTestBigLinkedList.java:1646)
> at
> org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:123)
> at
> org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList.main(IntegrationTestBigLinkedList.java:1686)
> {code}
> Its trying to build a sorted set out of the 300M items.... Dang.
> The 10B test passed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)