[
https://issues.apache.org/jira/browse/HBASE-16931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602322#comment-15602322
]
Yu Li commented on HBASE-16931:
-------------------------------
All timed out cases failed because of OOME (why so frequent OOME?...)
{noformat}
Running org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd
Exception in thread "Thread-2475" java.lang.OutOfMemoryError: Java heap space
Running org.apache.hadoop.hbase.TestHBaseOnOtherDfsCluster
Running org.apache.hadoop.hbase.tool.TestCanaryTool
Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap space
Exception in thread "Thread-2505" java.lang.OutOfMemoryError: Java heap space
Exception in thread "Thread-2507" java.lang.OutOfMemoryError: Java heap space
{noformat}
And the failed case seems encountered some environment issue (_Unable to create
region directory_):
{noformat}
Running org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat
Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 23.592 sec <<<
FAILURE! - in
org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat
testScanYZYToEmpty(org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat)
Time elapsed: 0.044 sec <<< ERROR!
java.io.IOException: java.util.concurrent.ExecutionException:
java.io.IOException: Unable to create region directory:
/tmp/scantest1_snapshot__8235bb48-4e7b-4e00-ad80-b2ce716c8522/data/default/scantest1/519e450e89d832d702a416a9bca04b5d
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at
org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:180)
at
org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(RestoreSnapshotHelper.java:527)
at
org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:234)
at
org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:170)
at
org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:736)
at
org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormatImpl.restoreSnapshot(MultiTableSnapshotInputFormatImpl.java:249)
at
org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormatImpl.restoreSnapshots(MultiTableSnapshotInputFormatImpl.java:243)
at
org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormatImpl.setInput(MultiTableSnapshotInputFormatImpl.java:80)
at
org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormat.setInput(MultiTableSnapshotInputFormat.java:106)
at
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initMultiTableSnapshotMapperJob(TableMapReduceUtil.java:319)
at
org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat.initJob(TestMultiTableSnapshotInputFormat.java:72)
{noformat}
Ran above 4 cases locally and confirmed all could pass.
> Setting cell's seqId to zero in compaction flow might cause RS down.
> --------------------------------------------------------------------
>
> Key: HBASE-16931
> URL: https://issues.apache.org/jira/browse/HBASE-16931
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 2.0.0
> Reporter: binlijin
> Assignee: binlijin
> Priority: Critical
> Attachments: HBASE-16931-master.patch, HBASE-16931.branch-1.patch,
> HBASE-16931.branch-1.v2.patch, HBASE-16931_master_v2.patch,
> HBASE-16931_master_v3.patch, HBASE-16931_master_v4.patch,
> HBASE-16931_master_v5.patch
>
>
> Compactor#performCompaction
> do {
> hasMore = scanner.next(cells, scannerContext);
> // output to writer:
> for (Cell c : cells) {
> if (cleanSeqId && c.getSequenceId() <= smallestReadPoint) {
> CellUtil.setSequenceId(c, 0);
> }
> writer.append(c);
> }
> cells.clear();
> } while (hasMore);
> scanner.next will choose at most "hbase.hstore.compaction.kv.max" kvs, the
> last cell still reference by StoreScanner.prevCell, so if cleanSeqId is
> called when the scanner.next call StoreScanner.checkScanOrder may throw
> exception and cause regionserver down.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)