[
https://issues.apache.org/jira/browse/MAPREDUCE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987669#action_12987669
]
Ramkumar Vadali commented on MAPREDUCE-2283:
--------------------------------------------
Update:
If I run ant clean from the top-level and run `ant test
-Dtestcase=TestBlockFixer`, it runs fine.
But if I run ant test-patch from the top level and run it again, it gets stuck.
I ran with test.output=yes to see what was going on, and found this:
{code}
[junit] 11/01/27 09:21:24 INFO mapred.TaskTracker: TaskTracker up at:
localhost.localdomain/127.0.0.1:50197
[junit] 11/01/27 09:21:24 INFO mapred.TaskTracker: Starting tracker
tracker_host0.foo.com:localhost.localdomain/127.0.0.1:50197
[junit] 11/01/27 09:21:25 INFO ipc.Client: Retrying connect to server:
localhost/127.0.0.1:0. Already tried 0 time(s).
[junit] 11/01/27 09:21:26 INFO ipc.Client: Retrying connect to server:
localhost/127.0.0.1:0. Already tried 1 time(s).
[junit] 11/01/27 09:21:27 INFO ipc.Client: Retrying connect to server:
localhost/127.0.0.1:0. Already tried 2 time(s).
[junit] 11/01/27 09:21:28 INFO ipc.Client: Retrying connect to server:
localhost/127.0.0.1:0. Already tried 3 time(s).
[junit] 11/01/27 09:21:29 INFO ipc.Client: Retrying connect to server:
localhost/127.0.0.1:0. Already tried 4 time(s).
[junit] 11/01/27 09:21:30 INFO ipc.Client: Retrying connect to server:
localhost/127.0.0.1:0. Already tried 5 time(s).
[junit] 11/01/27 09:21:31 INFO ipc.Client: Retrying connect to server:
localhost/127.0.0.1:0. Already tried 6 time(s).
[junit] 11/01/27 09:21:32 INFO ipc.Client: Retrying connect to server:
localhost/127.0.0.1:0. Already tried 7 time(s).
[junit] 11/01/27 09:21:33 INFO ipc.Client: Retrying connect to server:
localhost/127.0.0.1:0. Already tried 8 time(s).
[junit] 11/01/27 09:21:34 INFO ipc.Client: Retrying connect to server:
localhost/127.0.0.1:0. Already tried 9 time(s).
[junit] 11/01/27 09:21:34 INFO ipc.RPC: Server at localhost/127.0.0.1:0 not
available yet, Zzzzz...
{code}
I think hudson does something like this, and ant test-patch is somehow pulling
in a jar that prevents MiniMRCluster from starting. To check, I wrote a simple
test that only tries to start a MiniMRCluster:
{code}
public class TestStuckMiniMR extends TestCase {
public static final int NUM_DATANODES = 3;
Configuration conf;
String namenode = null;
MiniDFSCluster dfs = null;
MiniMRCluster mr = null;
String jobTrackerName = null;
FileSystem fileSys = null;
protected void setUp() throws Exception {
conf = new Configuration();
dfs = new MiniDFSCluster(conf, NUM_DATANODES, true, null);
dfs.waitActive();
fileSys = dfs.getFileSystem();
namenode = fileSys.getUri().toString();
FileSystem.setDefaultUri(conf, namenode);
mr = new MiniMRCluster(4, namenode, 3);
jobTrackerName = "localhost:" + mr.getJobTrackerPort();
}
protected void tearDown() {
dfs.shutdown();
mr.shutdown();
}
public void testStuck() throws Exception {
System.out.println("Done");
}
}
{code}
This also gets stuck in setup. So I think the problem is outside RAID. Infact,
just after I tried this, I tried running a test under contrib/streaming. That
also gets stuck the same way.
{code}
ant test -Dtestcase=TestFileArgs -Dtest.output=yes
{code}
The output:
{code}
[junit] 11/01/27 09:42:10 INFO mapred.TaskTracker: TaskTracker up at:
localhost.localdomain/127.0.0.1:59339
[junit] 11/01/27 09:42:10 INFO mapred.TaskTracker: Starting tracker
tracker_host0.foo.com:localhost.localdomain/127.0.0.1:59339
[junit] 11/01/27 09:42:11 INFO ipc.Client: Retrying connect to server:
localhost/127.0.0.1:0. Already tried 0 time(s).
[junit] 11/01/27 09:42:12 INFO ipc.Client: Retrying connect to server:
localhost/127.0.0.1:0. Already tried 1 time(s).
[junit] 11/01/27 09:42:13 INFO ipc.Client: Retrying connect to server:
localhost/127.0.0.1:0. Already tried 2 time(s).
{code}
Can someone try killing TestBlockFixer and run TestFileArgs on the machine
thats running hudson?
> TestBlockFixer hangs initializing MiniMRCluster
> -----------------------------------------------
>
> Key: MAPREDUCE-2283
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2283
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: contrib/raid
> Affects Versions: 0.23.0
> Reporter: Nigel Daley
> Priority: Blocker
> Fix For: 0.22.0
>
>
> TestBlockFixer (a raid contrib test) is hanging the precommit testing on
> Hudson
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.