Thanks everyone, I finally managed to run mapreduce over my dfs. As you mentioned there was no need to run datanode or namenode. The only required config was to set yarn.app.mapreduce.am.staging-dir to point to my dfs so all the node could access it as in hdfs. Something that I noticed when I run the TestDFSIO is that the block size that my filesystem get for write/read is super small 4k. I changed file.blocksize in core-site.xml but did not make any change. I guess it affects HDFS, is there any parameter or somwhere in the code that I can change the block size?
Thanks, On Thu, Dec 18, 2014 at 1:00 PM, Allen Wittenauer <[email protected]> wrote: > > I think you missed the point that Harsh was pointing out: > > The namenode and datanode is used to build the hdfs:// filesystem . There > is no namenode or datanode in a file:/// setup. That’s why running the > namenode blew up. If you want to use something besides hdfs://, then you > only run the YARN daemons. > > On Dec 18, 2014, at 8:56 AM, Behrooz Shafiee <[email protected]> wrote: > > > Because my FS is an in-memory distributed file system; therefore, I > believe > > it can significantly improve IO intensive tasks on HADOOP. > > > > On Thu, Dec 18, 2014 at 2:27 AM, Harsh J <[email protected]> wrote: > >> > >> NameNodes and DataNodes are services that are part of HDFS. Why are > >> you attempting to start them on top of your own DFS? > >> > >> On Thu, Dec 18, 2014 at 6:35 AM, Behrooz Shafiee <[email protected]> > >> wrote: > >>> Hello folks, > >>> > >>> I have developed my own distributed file system and I want to try it > >> with > >>> hadoop MapReduce. It is a POSIX compatible file system and can be > mounted > >>> under a directory; eg." /myfs". I was wondering how I can configure > >> hadoop > >>> to use my own fs instead of hdfs. What are the configurations that need > >> to > >>> be changed? Or what source files should I modify? Using google I came > >>> across the sample of using lustre with hadoop and tried to apply them > but > >>> it failed. > >>> > >>> I setup a cluster and mounted my own filesystem under /myfs in all of > my > >>> nodes and changed the core-site.xml and maprd-site.xml following: > >>> > >>> core-site.xml: > >>> > >>> fs.default.name -> file:/// > >>> fs.defaultFS -> file:/// > >>> hadoop.tmp.dir -> /myfs > >>> > >>> > >>> in mapred-site.xml: > >>> > >>> mapreduce.jobtracker.staging.root.dir -> /myfs/user > >>> mapred.system.dir -> /myfs/system > >>> mapred.local.dir -> /myfs/mapred_${host.name} > >>> > >>> and finally, hadoop-env.sh: > >>> > >>> added "-Dhost.name=`hostname -s`" to HADOOP_OPTS > >>> > >>> However, when I try to start my namenode, I get this error: > >>> > >>> 2014-12-17 19:44:35,902 FATAL > >>> org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start > >> namenode. > >>> java.lang.IllegalArgumentException: Invalid URI for NameNode address > >> (check > >>> fs.defaultFS): file:///home/kos/msthesis/BFS/mountdir has no authority. > >>> at > >>> > >> > org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:423) > >>> at > >>> > >> > org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:413) > >>> at > >>> > >> > org.apache.hadoop.hdfs.server.namenode.NameNode.getRpcServerAddress(NameNode.java:464) > >>> at > >>> > >> > org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:564) > >>> at > >>> > >> > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:584) > >>> at > >>> > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:762) > >>> at > >>> > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:746) > >>> at > >>> > >> > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1438) > >>> at > >>> > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504) > >>> 2014-12-17 19:44:35,914 INFO org.apache.hadoop.util.ExitUtil: Exiting > >> with > >>> status 1 > >>> > >>> for starting datanodes I get this error: > >>> 2014-12-17 20:02:34,028 FATAL > >>> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in > secureMain > >>> java.io.IOException: Incorrect configuration: namenode address > >>> dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not > >>> configured. > >>> at > >>> > >> > org.apache.hadoop.hdfs.DFSUtil.getNNServiceRpcAddressesForCluster(DFSUtil.java:866) > >>> at > >>> > >> > org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.java:155) > >>> at > >>> > >> > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1074) > >>> at > >>> > org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:415) > >>> at > >>> > >> > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2268) > >>> at > >>> > >> > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2155) > >>> at > >>> > >> > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2202) > >>> at > >>> > >> > org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2378) > >>> at > >>> > org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2402) > >>> 2014-12-17 20:02:34,036 INFO org.apache.hadoop.util.ExitUtil: Exiting > >> with > >>> status 1 > >>> > >>> > >>> I really appreciate if any one help about these problems. > >>> Thanks in advance, > >>> > >>> -- > >>> Behrooz > >> > >> > >> > >> -- > >> Harsh J > >> > > > > > > -- > > Behrooz > > -- Behrooz
