Andrew,

> Just to be clear, I'm only sharing the Hadoop binaries and config files via 
> NFS.  I don't see how this would cause a conflict - do you have any 
> additional information?

FWIW, we had an experience where we were storing config files on NFS
on a large cluster. Randomly, (and we guess due to NFS problems),
Hadoop would fail picking up the config files on NFS and instead use
its defaults. The config values for some directory paths defined in
default being different from the actual config values was resulting in
very odd errors. We were able to eventually solve the problem by
moving the config files off NFS. Of course, the size of the cluster
(several hundreds of slaves) was probably a reason. But nevertheless,
you may want to try pulling everything off NFS.

Thanks
Hemanth

>
> The referenced path in the error below (/srv/hadoop/dfs/1) is not being 
> shared via NFS...
>
> Thanks,
> Andrew
>
> On May 13, 2010, at 6:51 PM, Jeff Zhang wrote:
>
>> It is not suggested to deploy hadoop on NFS, there will be conflict
>> between data nodes, because NFS share the same namespace of file
>> system.
>>
>>
>>
>> On Thu, May 13, 2010 at 9:52 PM, Andrew Nguyen <and...@ucsfcti.org> wrote:
>>>
>>> Yes, in this deployment, I'm attempting to share the hadoop files via NFS.  
>>> The log and pid directories are local.
>>>
>>> Thanks!
>>>
>>> --Andrew
>>>
>>> On May 12, 2010, at 7:40 PM, Jeff Zhang wrote:
>>>
>>>> These 4 nodes share NFS ?
>>>>
>>>>
>>>> On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen
>>>> <andrew-lists-had...@ucsfcti.org> wrote:
>>>>> I'm working on bringing up a second test cluster and am getting these 
>>>>> intermittent errors on the DataNodes:
>>>>>
>>>>> 2010-05-12 17:17:15,094 ERROR 
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: 
>>>>> java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such 
>>>>> file or directory)
>>>>>        at java.io.RandomAccessFile.open(Native Method)
>>>>>        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
>>>>>        at 
>>>>> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249)
>>>>>        at 
>>>>> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243)
>>>>>        at 
>>>>> org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689)
>>>>>        at 
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560)
>>>>>        at 
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230)
>>>>>        at 
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273)
>>>>>        at 
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394)
>>>>>
>>>>>
>>>>> There are 4 slaves and sometimes 1 or 2 have the error but the specific 
>>>>> nodes change.  Sometimes it's slave1, sometimes it's slave4, etc.
>>>>>
>>>>> Any thoughts?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> --Andrew
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>
>

Reply via email to