I figured out why this was hung in the single node case: The node manager was failing to start because the resource manager was already listening on the same port. I had followed Cloudera's example YARN setup, which incorrectly (or at least unwisely) uses port 8040 for yarn.resourcemanager.address:
https://ccp.cloudera.com/display/CDH4B2/Deploying+MapReduce+v2+%28YARN%29+on+a+Cluster Switching to the default port (8032), both single and clustered configurations now fail due to this: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/v2/app/MRAppMaster Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.v2.app.MRAppMaster at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Could not find the main class: org.apache.hadoop.mapreduce.v2.app.MRAppMaster. Program will exit. How can I fix the classpath to include the appropriate JAR (hadoop-mapreduce-client-app-0.23.1-cdh4.0.0b2.jar)? It does appear to be on the nodemanager classpath. Thanks, Trevor On Mon, May 14, 2012 at 3:46 PM, Trevor Robinson <tre...@scurrilous.com> wrote: > Would someone please give me some troubleshooting tips for TestDFSIO > hanging on a new 0.23.1-cdh4b2 cluster? I've tried both a 5-machine > cluster and just running everything on a single node. It's my first > time configuring YARN, so maybe I've misconfigured something. I don't > see anything suspicious in the logs for namenode, datanode, > resourcemanager, or nodemanager. > > $ sudo su hdfs -c 'bin/hadoop --config etc/hadoop jar > ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-0.23.1-cdh4.0.0b2-tests.jar > TestDFSIO -write -nrFiles 10 -fileSize 1GB' > 12/05/14 15:02:11 INFO fs.TestDFSIO: TestDFSIO.0.0.6 > 12/05/14 15:02:11 INFO fs.TestDFSIO: nrFiles = 10 > 12/05/14 15:02:11 INFO fs.TestDFSIO: fileSize (MB) = 1024.0 > 12/05/14 15:02:11 INFO fs.TestDFSIO: bufferSize = 1000000 > 12/05/14 15:02:11 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO > 12/05/14 15:02:11 INFO fs.TestDFSIO: creating control file: 1073741824 > bytes, 10 files > 12/05/14 15:02:12 INFO fs.TestDFSIO: created control files for: 10 files > 12/05/14 15:02:13 INFO mapred.FileInputFormat: Total input paths to process : > 10 > 12/05/14 15:02:13 INFO mapreduce.JobSubmitter: number of splits:10 > 12/05/14 15:02:13 WARN conf.Configuration: mapred.jar is deprecated. > Instead, use mapreduce.job.jar > 12/05/14 15:02:13 WARN conf.Configuration: mapred.reduce.tasks is > deprecated. Instead, use mapreduce.job.reduces > 12/05/14 15:02:13 WARN conf.Configuration: mapred.output.value.class > is deprecated. Instead, use mapreduce.job.output.value.class > 12/05/14 15:02:13 WARN conf.Configuration: > mapred.used.genericoptionsparser is deprecated. Instead, use > mapreduce.client.genericoptionsparser.used > 12/05/14 15:02:13 WARN conf.Configuration: mapred.job.name is > deprecated. Instead, use mapreduce.job.name > 12/05/14 15:02:13 WARN conf.Configuration: mapred.input.dir is > deprecated. Instead, use mapreduce.input.fileinputformat.inputdir > 12/05/14 15:02:13 WARN conf.Configuration: mapred.output.dir is > deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir > 12/05/14 15:02:13 WARN conf.Configuration: mapred.map.tasks is > deprecated. Instead, use mapreduce.job.maps > 12/05/14 15:02:13 WARN conf.Configuration: mapred.output.key.class is > deprecated. Instead, use mapreduce.job.output.key.class > 12/05/14 15:02:13 WARN conf.Configuration: io.bytes.per.checksum is > deprecated. Instead, use dfs.bytes-per-checksum > 12/05/14 15:02:13 WARN conf.Configuration: mapred.working.dir is > deprecated. Instead, use mapreduce.job.working.dir > 12/05/14 15:02:13 INFO mapred.ResourceMgrDelegate: Submitted > application application_1337025701572_0001 to ResourceManager at > server1/10.10.130.30:8040 > 12/05/14 15:02:13 INFO mapreduce.Job: The url to track the job: > http://server1:8088/proxy/application_1337025701572_0001/ > 12/05/14 15:02:13 INFO mapreduce.Job: Running job: job_1337025701572_0001 > <30 minutes pass - no significant CPU or disk activity> > > Thanks, > Trevor