I think it would be nice if YARN could work on existing older HDFS instances, a lot of folks will be slow to upgrade HDFS with all their important data on it. I could also go that route I guess.

Avery

On 12/6/11 8:51 AM, Arun C Murthy wrote:
Avery,

  They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 
(aka FileContext apis). Both the old (FileSystem apis) and new are supported in 
hadoop-0.23.

  We have used the new HDFS apis in YARN in some places.

hth,
Arun

On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:

Thank you for the response, that's what I thought as well =).  I spent the day 
trying to port the required 0.23 APIs to 0.20 HDFS.  There have been a lot of 
API changes!

Avery

On 12/5/11 9:14 PM, Mahadev Konar wrote:
Avery,
  Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
wrong but looking at the HDFS apis' it doesnt look like that it would
be a lot of work to getting it to work with 0.20 apis. We had been
using filecontext api's initially but have transitioned back to the
old API's.

Hope that helps.

mahadev

On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching<ach...@apache.org>   wrote:
Hi,

I've been playing with 0.23.0, really nice stuff!  I was able to setup a
small test cluster (40 nodes) and launch the example jobs.  I was also able
to recompile old Hadoop programs with the new jars and start up those
programs as well.  My question is the following:

We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
  This appears to be a bit of work.  Launching the jobs gives me the
following error:

2011-12-05 15:48:05,023 INFO  ipc.YarnRPC (YarnRPC.java:create(47)) -
Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
2011-12-05 15:48:05,040 INFO  mapred.ResourceMgrDelegate
(ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
{removed}.{xxx}/{removed}:50177
2011-12-05 15:48:05,041 INFO  ipc.HadoopYarnRPC
(HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
2011-12-05 15:48:05,121 INFO  mapred.ResourceMgrDelegate
(ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
{removed}.{xxx}/{removed}:50177
2011-12-05 15:48:05,133 INFO  mapreduce.Cluster
(Cluster.java:initialize(116)) - Failed to use
org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
Please check your configuration for mapreduce.framework.name and the
correspond server addresses.
    at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
    at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
    at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
    at
org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
    at
org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:189)

After doing a little digging it appears that YarnClientProtocolProvider
creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
not available available in older versions of HDFS.

What versions of HDFS are currently supported and what HDFS versions are
planned for support?  It would be great to be able to run YARN on legacy
HDFS installations.

Thanks,

Avery

Reply via email to