Re: Using hadoop with other distributed filesystems

Ted Yu Sat, 20 Dec 2014 17:02:47 -0800

See this thread: http://search-hadoop.com/m/LgpTk2dvrTk1


Cheers

On Dec 20, 2014, at 3:46 PM, Behrooz Shafiee <[email protected]> wrote:

> Thanks everyone,
> I finally managed to run mapreduce over my dfs. As you mentioned there was
> no need to run datanode or namenode. The only required config was to set
> yarn.app.mapreduce.am.staging-dir  to point to my dfs so all the node could
> access it as in hdfs.
> Something that I noticed when I run the TestDFSIO is that the block size
> that my filesystem get for write/read is super small 4k. I changed
> file.blocksize in core-site.xml but did not make any change. I guess it
> affects HDFS, is there any parameter or somwhere in the code that I can
> change the block size?
> 
> Thanks,
> 
> On Thu, Dec 18, 2014 at 1:00 PM, Allen Wittenauer <[email protected]> wrote:
> 
>> 
>> I think you missed the point that Harsh was pointing out:
>> 
>> The namenode and datanode is used to build the hdfs:// filesystem .  There
>> is no namenode or datanode in a file:/// setup.  That’s why running the
>> namenode blew up.  If you want to use something besides hdfs://, then you
>> only run the YARN daemons.
>> 
>> On Dec 18, 2014, at 8:56 AM, Behrooz Shafiee <[email protected]> wrote:
>> 
>>> Because my FS is an in-memory distributed file system; therefore, I
>> believe
>>> it can significantly improve IO intensive tasks on HADOOP.
>>> 
>>> On Thu, Dec 18, 2014 at 2:27 AM, Harsh J <[email protected]> wrote:
>>>> 
>>>> NameNodes and DataNodes are services that are part of HDFS. Why are
>>>> you attempting to start them on top of your own DFS?
>>>> 
>>>> On Thu, Dec 18, 2014 at 6:35 AM, Behrooz Shafiee <[email protected]>
>>>> wrote:
>>>>> Hello folks,
>>>>> 
>>>>> I have developed my own distributed file system and I want to try it
>>>> with
>>>>> hadoop MapReduce. It is a POSIX compatible file system and can be
>> mounted
>>>>> under a directory; eg." /myfs". I was wondering how I can configure
>>>> hadoop
>>>>> to use my own fs instead of hdfs. What are the configurations that need
>>>> to
>>>>> be changed? Or what source files should I modify?  Using google I came
>>>>> across the sample of using lustre with hadoop and tried to apply them
>> but
>>>>> it failed.
>>>>> 
>>>>> I setup a cluster and mounted my own filesystem under /myfs in all of
>> my
>>>>> nodes and changed the core-site.xml  and maprd-site.xml following:
>>>>> 
>>>>> core-site.xml:
>>>>> 
>>>>> fs.default.name -> file:///
>>>>> fs.defaultFS -> file:///
>>>>> hadoop.tmp.dir -> /myfs
>>>>> 
>>>>> 
>>>>> in mapred-site.xml:
>>>>> 
>>>>> mapreduce.jobtracker.staging.root.dir -> /myfs/user
>>>>> mapred.system.dir -> /myfs/system
>>>>> mapred.local.dir -> /myfs/mapred_${host.name}
>>>>> 
>>>>> and finally, hadoop-env.sh:
>>>>> 
>>>>> added "-Dhost.name=`hostname -s`" to  HADOOP_OPTS
>>>>> 
>>>>> However, when I try to start my namenode, I get this error:
>>>>> 
>>>>> 2014-12-17 19:44:35,902 FATAL
>>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start
>>>> namenode.
>>>>> java.lang.IllegalArgumentException: Invalid URI for NameNode address
>>>> (check
>>>>> fs.defaultFS): file:///home/kos/msthesis/BFS/mountdir has no authority.
>>>>>       at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:423)
>>>>>       at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:413)
>>>>>       at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.getRpcServerAddress(NameNode.java:464)
>>>>>       at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:564)
>>>>>       at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:584)
>>>>>       at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:762)
>>>>>       at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:746)
>>>>>       at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1438)
>>>>>       at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)
>>>>> 2014-12-17 19:44:35,914 INFO org.apache.hadoop.util.ExitUtil: Exiting
>>>> with
>>>>> status 1
>>>>> 
>>>>> for starting datanodes I get this error:
>>>>> 2014-12-17 20:02:34,028 FATAL
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
>> secureMain
>>>>> java.io.IOException: Incorrect configuration: namenode address
>>>>> dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not
>>>>> configured.
>>>>>       at
>> org.apache.hadoop.hdfs.DFSUtil.getNNServiceRpcAddressesForCluster(DFSUtil.java:866)
>>>>>       at
>> org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.java:155)
>>>>>       at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1074)
>>>>>       at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:415)
>>>>>       at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2268)
>>>>>       at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2155)
>>>>>       at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2202)
>>>>>       at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2378)
>>>>>       at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2402)
>>>>> 2014-12-17 20:02:34,036 INFO org.apache.hadoop.util.ExitUtil: Exiting
>>>> with
>>>>> status 1
>>>>> 
>>>>> 
>>>>> I really appreciate if any one help about these problems.
>>>>> Thanks in advance,
>>>>> 
>>>>> --
>>>>> Behrooz
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Harsh J
>>> 
>>> 
>>> --
>>> Behrooz
> 
> 
> -- 
> Behrooz

Re: Using hadoop with other distributed filesystems

Reply via email to