Re: Using hadoop with other distributed filesystems

Behrooz Shafiee Thu, 18 Dec 2014 09:04:38 -0800

Vinay,

 Yes, my file system is fully POSIX compatible; therefore, you can do
whatever you can do with let's say ext4. read/write/links/rm/.... .It can
be mounted under any directory and works as a usual linux file system.
That's why I thought I can use file:///.


On Thu, Dec 18, 2014 at 3:19 AM, Vinayakumar B <[email protected]>
wrote:
>
> As you mentioned.. you have mapped your dfs as a partition (/mydfs). If its
> fully accessible using normal File apis then you can continue to use as
> file:///. Remember you dont need any of the HDFS services (Namenode,
> Datanode).
>
Allright, does it mean I don't nead to start neither Namenode on master nor
DataNode on slave? just ResourceManager on Master and NodeManager on slaves?
I actually tried this and I get this error when I want to run a hadoop
example, like I tried teragen and I get the following error:

14/12/18 11:53:15 INFO mapreduce.Job: Job job_1418918392941_0001 failed
with state FAILED due to: Application application_1418918392941_0001 failed
2 times due to AM Container for appattempt_1418918392941_0001_000002 exited
with  exitCode: -1000
For more detailed output, check application tracking page:
http://i01:8088/proxy/application_1418918392941_0001/Then, click on links
to logs of each attempt.
Diagnostics: File
file:/tmp/hadoop-yarn/staging/kos/.staging/job_1418918392941_0001/job.splitmetainfo
does not exist
java.io.FileNotFoundException: File
file:/tmp/hadoop-yarn/staging/kos/.staging/job_1418918392941_0001/job.splitmetainfo
does not exist
    at
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
    at
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
    at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
    at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
    at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Failing this attempt. Failing the application.

Does resource manager use hdfs to assign nodes any information? is there
any especial configuration that I have to do? I am using latest stalbe
version 2.6.0

Thanks,


> If your filesystem is accessed in different way, then you might need to
> have to extend FileSystem and configure corresponding configurations.
>
> -Vinay
> On Dec 18, 2014 1:10 PM, "yuliya Feldman" <[email protected]>
> wrote:
>
> > You forgot one important property:
> > fs.<yourfs>.impl   to map it to the class that has implementation of your
> > FS.
> >
> > The way you were trying to set up other properties also looks like usage
> > of local FS, you should probably not use file:///, but your FS prefix:
> > foo:///  - or full URI.
> > Is your FS using NameNode, DataNode or it is different. If it is
> different
> > you don't need to try to bring those up, if it is using NN and DN then
> you
> > need to define URI in fs.default.name and/or fs.defaultFS
> > (foo://namenode:8030  ).
> >
> >
> >
> >      From: Behrooz Shafiee <[email protected]>
> >  To: [email protected]
> >  Sent: Wednesday, December 17, 2014 5:05 PM
> >  Subject: Using hadoop with other distributed filesystems
> >
> > Hello folks,
> >
> >  I have developed my own distributed file system and I want to try it
> with
> > hadoop MapReduce. It is a POSIX compatible file system and can be mounted
> > under a directory; eg." /myfs". I was wondering how I can configure
> hadoop
> > to use my own fs instead of hdfs. What are the configurations that need
> to
> > be changed? Or what source files should I modify?  Using google I came
> > across the sample of using lustre with hadoop and tried to apply them but
> > it failed.
> >
> > I setup a cluster and mounted my own filesystem under /myfs in all of my
> > nodes and changed the core-site.xml  and maprd-site.xml following:
> >
> > core-site.xml:
> >
> > fs.default.name -> file:///
> > fs.defaultFS -> file:///
> > hadoop.tmp.dir -> /myfs
> >
> >
> > in mapred-site.xml:
> >
> > mapreduce.jobtracker.staging.root.dir -> /myfs/user
> > mapred.system.dir -> /myfs/system
> > mapred.local.dir -> /myfs/mapred_${host.name}
> >
> > and finally, hadoop-env.sh:
> >
> > added "-Dhost.name=`hostname -s`" to  HADOOP_OPTS
> >
> > However, when I try to start my namenode, I get this error:
> >
> > 2014-12-17 19:44:35,902 FATAL
> > org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start
> namenode.
> > java.lang.IllegalArgumentException: Invalid URI for NameNode address
> (check
> > fs.defaultFS): file:///home/kos/msthesis/BFS/mountdir has no authority.
> >         at
> >
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:423)
> >         at
> >
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:413)
> >         at
> >
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.getRpcServerAddress(NameNode.java:464)
> >         at
> >
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:564)
> >         at
> >
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:584)
> >         at
> > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:762)
> >         at
> > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:746)
> >         at
> >
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1438)
> >         at
> > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)
> > 2014-12-17 19:44:35,914 INFO org.apache.hadoop.util.ExitUtil: Exiting
> with
> > status 1
> >
> > for starting datanodes I get this error:
> > 2014-12-17 20:02:34,028 FATAL
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
> > java.io.IOException: Incorrect configuration: namenode address
> > dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not
> > configured.
> >         at
> >
> >
> org.apache.hadoop.hdfs.DFSUtil.getNNServiceRpcAddressesForCluster(DFSUtil.java:866)
> >         at
> >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.java:155)
> >         at
> >
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1074)
> >         at
> > org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:415)
> >         at
> >
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2268)
> >         at
> >
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2155)
> >         at
> >
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2202)
> >         at
> >
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2378)
> >         at
> > org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2402)
> > 2014-12-17 20:02:34,036 INFO org.apache.hadoop.util.ExitUtil: Exiting
> with
> > status 1
> >
> >
> > I really appreciate if any one help about these problems.
> > Thanks in advance,
> >
> > --
> > Behrooz
> >
> >
> >
>


-- 
Behrooz

Re: Using hadoop with other distributed filesystems

Reply via email to