Hi,

I've been playing with 0.23.0, really nice stuff! I was able to setup a small test cluster (40 nodes) and launch the example jobs. I was also able to recompile old Hadoop programs with the new jars and start up those programs as well. My question is the following:

We have an HDFS instance based on 0.20 that I would like to hook up to YARN. This appears to be a bit of work. Launching the jobs gives me the following error:

2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) - Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC 2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at {removed}.{xxx}/{removed}:50177 2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol 2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at {removed}.{xxx}/{removed}:50177 2011-12-05 15:48:05,133 INFO mapreduce.Cluster (Cluster.java:initialize(116)) - Failed to use org.apache.hadoop.mapred.YarnClientProtocolProvider due to error: java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
    at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
    at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
    at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
at org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
at org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:189)

After doing a little digging it appears that YarnClientProtocolProvider creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is not available available in older versions of HDFS.

What versions of HDFS are currently supported and what HDFS versions are planned for support? It would be great to be able to run YARN on legacy HDFS installations.

Thanks,

Avery

Reply via email to