[
https://issues.apache.org/jira/browse/HBASE-14420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945672#comment-14945672
]
stack commented on HBASE-14420:
-------------------------------
Going over the last 40 patch builds:
TestReplicationShell hangs three times. Was added to master only. HBASE-13084
adds it by running all shell commands again plus the new
replication_admin_test.rb command. I'm going to disable it for now.
HBASE-14561.
TestHFileOutputFormat2 failed 5 times in last 40 runs. I spent time on it
yesterday. Seems to be a reliance on test order but was having networking
issues which complicated my being able to do diagnosis.... It seems like an
ambitious amount of work to get done in a unit test:
{code}
* Simple test for {@link CellSortReducer} and {@link HFileOutputFormat2}.
* Sets up and runs a mapreduce job that writes hfile output.
* Creates a few inner classes to implement splits and an inputformat that
* emits keys and values like those of {@link PerformanceEvaluation}.
{code}
Was added a good while ago, here:
commit e4f8a7419fb4bd0102eaf91e9747de6261e0b5c5
Author: jxiang <jxiang@unknown>
Date: Fri Feb 21 20:39:21 2014 +0000
HBASE-10526 Using Cell instead of KeyValue in HFileOutputFormat
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1570702
13f79535-47bb-0310-9956-ffa450edef68
I'm just going to disable it until someone wants to work on it.
Here is the list of all test failures and their counts:
2 Hanging test : org.apache.hadoop.hbase.TestNodeHealthCheckChore
1 Hanging test : org.apache.hadoop.hbase.TestPartialResultsFromClientSide
2 Hanging test : org.apache.hadoop.hbase.client.TestFromClientSide
1 Hanging test :
org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor
1 Hanging test : org.apache.hadoop.hbase.client.TestReplicasClient
3 Hanging test : org.apache.hadoop.hbase.client.TestReplicationShell
1 Hanging test : org.apache.hadoop.hbase.constraint.TestConstraint
1 Hanging test : org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd
1 Hanging test : org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite
2 Hanging test : org.apache.hadoop.hbase.mapreduce.TestCopyTable
1 Hanging test : org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
5 Hanging test : org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat2
1 Hanging test : org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat
1 Hanging test : org.apache.hadoop.hbase.mapreduce.TestTableInputFormat
1 Hanging test : org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan2
1 Hanging test : org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
1 Hanging test : org.apache.hadoop.hbase.replication.TestMasterReplication
1 Hanging test :
org.apache.hadoop.hbase.replication.TestReplicationKillMasterRSCompressed
1 Hanging test :
org.apache.hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpoint
1 Hanging test :
org.apache.hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpointNoMaster
1 Hanging test :
org.apache.hadoop.hbase.replication.regionserver.TestReplicationWALReaderManager
1 Hanging test : org.apache.hadoop.hbase.security.access.TestAccessController
1 Hanging test : org.apache.hadoop.hbase.security.access.TestCellACLs
1 Hanging test :
org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelReplicationWithExpAsString
1 Hanging test :
org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDeletes
1 Hanging test :
org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDistributedLogReplay
1 Hanging test : org.apache.hadoop.hbase.snapshot.TestExportSnapshot
1 Hanging test : org.apache.hadoop.hbase.snapshot.TestMobExportSnapshot
1 Hanging test :
org.apache.hadoop.hbase.snapshot.TestMobFlushSnapshotFromClient
1 Hanging test : org.apache.hadoop.hbase.snapshot.TestMobSecureExportSnapshot
1 Hanging test : org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot
> Zombie Stomping Session
> -----------------------
>
> Key: HBASE-14420
> URL: https://issues.apache.org/jira/browse/HBASE-14420
> Project: HBase
> Issue Type: Umbrella
> Components: test
> Reporter: stack
> Assignee: stack
> Priority: Critical
> Attachments: hangers.txt
>
>
> Patch build are now failing most of the time because we are dropping zombies.
> I confirm we are doing this on non-apache build boxes too.
> Left-over zombies consume resources on build boxes (OOME cannot create native
> threads). Having to do multiple test runs in the hope that we can get a
> non-zombie-making build or making (arbitrary) rulings that the zombies are
> 'not related' is a productivity sink. And so on...
> This is an umbrella issue for a zombie stomping session that started earlier
> this week. Will hang sub-issues of this one. Am running builds back-to-back
> on little cluster to turn out the monsters.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)