Re: could only be replicated to 0 nodes, instead of 1

Harsh J Fri, 15 Jul 2011 04:49:56 -0700

Thomas,

Your problem might lie simply with the virtual node DNs using /tmp and
tmpfs being used for that -- which somehow is causing reported free
space to go as 0 in reports to the NN (master).


tmpfs                 101M   44K  101M   1% /tmp

This causes your trouble that the NN can't choose a suitable DN to
write to, cause it determines that none has at least a block size
worth of space (64MB default) available for writes.

You can resolve as:

1. Stop DFS completely.

2. Create a directory under root somewhere (I use Cloudera's distro,
and its default configured location for data files comes along as
/var/lib/hadoop-0.20/cache/, if you need an idea for a location) and
set it as your hadoop.tmp.dir in core-site.xml on all the nodes.

3. Reformat your NameNode (hadoop namenode -format, say Y) and restart
DFS. Things _should_ be OK now.

Config example (core-site.xml):

 <property>
   <name>hadoop.tmp.dir</name>
   <value>/var/lib/hadoop-0.20/cache</value>
 </property>

Let us know if this still doesn't get your dev cluster up and running
for action :)

On Fri, Jul 15, 2011 at 4:40 PM, Thomas Anderson
<t.dt.aander...@gmail.com> wrote:
> When doing partition, I remember only / and swap was specified for all
> nodes during creation. So I think /tmp is also mounted under /, which
> should have size around 9G. The total size of hardisk specified is
> 10G.
>
> The df -kh shows
>
> server01:
> /dev/sda1             9.4G  2.3G  6.7G  25% /
> tmpfs                 5.0M  4.0K  5.0M   1% /lib/init/rw
> tmpfs                 5.0M     0  5.0M   0% /var/run/lock
> tmpfs                 101M  132K  101M   1% /tmp
> udev                  247M     0  247M   0% /dev
> tmpfs                 101M     0  101M   0% /var/run/shm
> tmpfs                  51M  176K   51M   1% /var/run
>
> server02:
> /dev/sda1             9.4G  2.2G  6.8G  25% /
> tmpfs                 5.0M  4.0K  5.0M   1% /lib/init/rw
> tmpfs                 5.0M     0  5.0M   0% /var/run/lock
> tmpfs                 101M   44K  101M   1% /tmp
> udev                  247M     0  247M   0% /dev
> tmpfs                 101M     0  101M   0% /var/run/shm
> tmpfs                  51M  176K   51M   1% /var/run
>
> server03:
> /dev/sda1             9.4G  2.2G  6.8G  25% /
> tmpfs                 5.0M  4.0K  5.0M   1% /lib/init/rw
> tmpfs                 5.0M     0  5.0M   0% /var/run/lock
> tmpfs                 101M   44K  101M   1% /tmp
> udev                  247M     0  247M   0% /dev
> tmpfs                 101M     0  101M   0% /var/run/shm
> tmpfs                  51M  176K   51M   1% /var/run
>
> server04:
> /dev/sda1             9.4G  2.2G  6.8G  25% /
> tmpfs                 5.0M  4.0K  5.0M   1% /lib/init/rw
> tmpfs                 5.0M     0  5.0M   0% /var/run/lock
> tmpfs                 101M   44K  101M   1% /tmp
> udev                  247M     0  247M   0% /dev
> tmpfs                 101M     0  101M   0% /var/run/shm
> tmpfs                  51M  176K   51M   1% /var/run
>
> server05:
> /dev/sda1             9.4G  2.2G  6.8G  25% /
> tmpfs                 5.0M  4.0K  5.0M   1% /lib/init/rw
> tmpfs                 5.0M     0  5.0M   0% /var/run/lock
> tmpfs                 101M   44K  101M   1% /tmp
> udev                  247M     0  247M   0% /dev
> tmpfs                 101M     0  101M   0% /var/run/shm
> tmpfs                  51M  176K   51M   1% /var/run
>
> In addition, the output of dfs (du -sk /tmp/hadoop-user/dfs) is
>
> server02:
> 8       /tmp/hadoop-user/dfs/
>
> server03:
> 8       /tmp/hadoop-user/dfs/
>
> server04:
> 8       /tmp/hadoop-user/dfs/
>
> server05:
> 8       /tmp/hadoop-user/dfs/
>
> On Fri, Jul 15, 2011 at 7:01 PM, Harsh J <ha...@cloudera.com> wrote:
>> (P.s. I asked that cause if you look at your NN's live nodes tables,
>> the reported space is all 0)
>>
>> What's the output of:
>>
>> du -sk /tmp/hadoop-user/dfs on all your DNs?
>>
>> On Fri, Jul 15, 2011 at 4:01 PM, Harsh J <ha...@cloudera.com> wrote:
>>> Thomas,
>>>
>>> Is your /tmp/ mount point also under the / or is it separate? Your
>>> dfs.data.dir are /tmp/hadoop-user/dfs/data in all DNs, and if they are
>>> separately mounted then what's the available space on that?
>>>
>>> (bad idea in production to keep things default on /tmp though, like
>>> dfs.name.dir, dfs.data.dir -- reconfigure+restart as necessary)
>>>
>>> On Fri, Jul 15, 2011 at 3:47 PM, Thomas Anderson
>>> <t.dt.aander...@gmail.com> wrote:
>>>> 1.) The disk usage (with df -kh) on namenode (server01)
>>>>
>>>> Filesystem            Size  Used Avail Use% Mounted on
>>>> /dev/sda1             9.4G  2.3G  6.7G  25% /
>>>>
>>>> and datanodes (server02 ~ server05)
>>>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>>>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>>>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>>>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>>>>
>>>> 2.) How can I make sure that datanode is busy? The environment is only
>>>> for testing so there is no other user processes are running at that
>>>> moment. Also it is a fresh installation, so only hadoop required
>>>> packages are installed such as hadoop and jdk.
>>>>
>>>> 3.) fs.block.size is not set in hdfs-site.xml, including datanodes and
>>>> namenode, because its purpose is for testing. I thought it would use
>>>> the default value, which should be 512?
>>>>
>>>> 4.) What might be a good way for fast check if network is not stable?
>>>> I check the healthy page e.g. server01:50070/dfshealth.jsp where
>>>> livenodes are up and  last contact varies when checking the page.
>>>>
>>>> Node     Last Contact    Admin State     Configured  Capacity (GB)       
>>>> Used
>>>> (GB)     Non DFS  Used (GB)      Remaining  (GB)         Used  (%)       
>>>> Used  (%)
>>>> Remaining  (%)   Blocks
>>>> server02         2      In Service      0.1     0       0       0.1     
>>>> 0.01     99.96  0
>>>> server03         0      In Service      0.1     0       0       0.1     
>>>> 0.01     99.96  0
>>>> server04         1      In Service      0.1     0       0       0.1     
>>>> 0.01     99.96  0
>>>> server05         2      In Service      0.1     0       0       0.1     
>>>> 0.01     99.96  0
>>>>
>>>> 5.) Only command `hadoop fs -put /tmp/testfile test` is issued as it
>>>> is just to test if the installation is working. So the file e.g.
>>>> testfile will be removed first (hadoop fs -rm test/testfile), then
>>>> upload again with hadoop put command.
>>>>
>>>> The logs are listed as below:
>>>>
>>>> namenode:
>>>> server01: http://pastebin.com/TLpDmmPx
>>>>
>>>> datanodes:
>>>> server02: http://pastebin.com/pdE5XKfi
>>>> server03: http://pastebin.com/4aV7ECCV
>>>> server04: http://pastebin.com/tF7HiRZj
>>>> server05: http://pastebin.com/5qwSPrvU
>>>>
>>>> Please let me know if more information needs to be provided.
>>>>
>>>> I really appreciate your suggestion.
>>>>
>>>> Thank you.
>>>>
>>>>
>>>> On Fri, Jul 15, 2011 at 4:54 PM, Brahma Reddy <brahmared...@huawei.com> 
>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> By seeing this exception(could only be replicated to 0 nodes, instead of 
>>>>> 1)
>>>>> ,datanode is not available to Name Node..
>>>>>
>>>>> This are the following cases Data Node may not available to Name Node
>>>>>
>>>>> 1)Data Node disk is Full
>>>>>
>>>>> 2)Data Node is Busy with block report and block scanning
>>>>>
>>>>> 3)If Block Size is Negative value(dfs.block.size in hdfs-site.xml)
>>>>>
>>>>> 4)while write in progress primary datanode goes down(Any n/w fluctations 
>>>>> b/w
>>>>> Name Node and Data Node Machines)
>>>>>
>>>>> 5)when Ever we append any partial chunk and call sync for subsequent 
>>>>> partial
>>>>> chunk appends client should store the previous data in buffer.
>>>>>
>>>>> For example after appending "a" I have called sync and when I am trying 
>>>>> the
>>>>> to append the buffer should have "ab"
>>>>>
>>>>> And Server side when the chunk is not multiple of 512 then it will try to 
>>>>> do
>>>>> Crc comparison for the data present in block file as well as crc present 
>>>>> in
>>>>> metafile. But while constructing crc for the data present in block it is
>>>>> always comparing till the initial Offeset
>>>>>
>>>>> Or For more analysis Please the data node logs
>>>>>
>>>>> Warm Regards
>>>>>
>>>>> Brahma Reddy
>>>>>
>>>>> ****************************************************************************
>>>>> ***********
>>>>> This e-mail and attachments contain confidential information from HUAWEI,
>>>>> which is intended only for the person or entity whose address is listed
>>>>> above. Any use of the information contained herein in any way (including,
>>>>> but not limited to, total or partial disclosure, reproduction, or
>>>>> dissemination) by persons other than the intended recipient's) is
>>>>> prohibited. If you receive this e-mail in error, please notify the sender 
>>>>> by
>>>>> phone or email immediately and delete it!
>>>>> -----Original Message-----
>>>>> From: Thomas Anderson [mailto:t.dt.aander...@gmail.com]
>>>>> Sent: Friday, July 15, 2011 9:09 AM
>>>>> To: hdfs-user@hadoop.apache.org
>>>>> Subject: could only be replicated to 0 nodes, instead of 1
>>>>>
>>>>> I have fresh  hadoop 0.20.2 installed on virtualbox 4.0.8 with jdk
>>>>> 1.6.0_26. The problem is when trying to put a file to hdfs, it throws
>>>>> error `org.apache.hadoop.ipc.RemoteException: java.io.IOException:
>>>>> File /path/to/file could only be replicated to 0 nodes, instead of 1';
>>>>> however, there is no problem to create a folder, as the command ls
>>>>> print the result
>>>>>
>>>>> Found 1 items
>>>>> drwxr-xr-x   - user supergroup          0 2011-07-15 11:09 /user/user/test
>>>>>
>>>>> I also try with flushing firewall (remove all iptables restriction),
>>>>> but the error message is still thrown out when uploading (hadoop fs
>>>>> -put /tmp/x test) a file from local fs.
>>>>>
>>>>> The name node log shows
>>>>>
>>>>> 2011-07-15 10:42:43,491 INFO org.apache.hadoop.hdfs.StateChange:
>>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>>> aaa.bbb.ccc.ddd.22:50010 storage DS-929017105-aaa.bbb.ccc.22-50010-13
>>>>> 10697763488
>>>>> 2011-07-15 10:42:43,495 INFO org.apache.hadoop.net.NetworkTopology:
>>>>> Adding a new node: /default-rack/aaa.bbb.ccc.22:50010
>>>>> 2011-07-15 10:42:44,169 INFO org.apache.hadoop.hdfs.StateChange:
>>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>>> aaa.bbb.ccc.35:50010 storage DS-884574392-aaa.bbb.ccc.35-50010-13
>>>>> 10697764164
>>>>> 2011-07-15 10:42:44,170 INFO org.apache.hadoop.net.NetworkTopology:
>>>>> Adding a new node: /default-rack/aaa.bbb.ccc.35:50010
>>>>> 2011-07-15 10:42:44,507 INFO org.apache.hadoop.hdfs.StateChange:
>>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>>> aaa.bbb.ccc.ddd.11:50010 storage DS-1537583073-aaa.bbb.ccc.11-50010-1
>>>>> 310697764488
>>>>> 2011-07-15 10:42:44,507 INFO org.apache.hadoop.net.NetworkTopology:
>>>>> Adding a new node: /default-rack/aaa.bbb.ccc.11:50010
>>>>> 2011-07-15 10:42:45,796 INFO org.apache.hadoop.hdfs.StateChange:
>>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>>> 140.127.220.25:50010 storage DS-1500589162-aaa.bbb.ccc.25-50010-1
>>>>> 310697765386
>>>>> 2011-07-15 10:42:45,797 INFO org.apache.hadoop.net.NetworkTopology:
>>>>> Adding a new node: /default-rack/aaa.bbb.ccc.25:50010
>>>>>
>>>>> And all datanodes have similar message as below:
>>>>>
>>>>> 2011-07-15 10:42:46,562 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: using
>>>>> BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec
>>>>> 2011-07-15 10:42:47,163 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0
>>>>> blocks got processed in 3 msecs
>>>>> 2011-07-15 10:42:47,187 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic
>>>>> block scanner.
>>>>> 2011-07-15 11:19:42,931 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0
>>>>> blocks got processed in 1 msecs
>>>>>
>>>>> Command `hadoop fsck /`  displays
>>>>>
>>>>> Status: HEALTHY
>>>>>  Total size:    0 B
>>>>>  Total dirs:    3
>>>>>  Total files:   0 (Files currently being written: 1)
>>>>>  Total blocks (validated):      0
>>>>>  Minimally replicated blocks:   0
>>>>>  Over-replicated blocks:        0
>>>>>  Under-replicated blocks:       0
>>>>>  Mis-replicated blocks:         0
>>>>>  Default replication factor:    3
>>>>>  Average block replication:     0.0
>>>>>  Corrupt blocks:                0
>>>>>  Missing replicas:              0
>>>>>  Number of data-nodes:          4
>>>>>
>>>>> The setting in conf include:
>>>>>
>>>>> - Master node:
>>>>> core-site.xml
>>>>>  <property>
>>>>>    <name>fs.default.name</name>
>>>>>    <value>hdfs://lab01:9000/</value>
>>>>>  </property>
>>>>>
>>>>> hdfs-site.xml
>>>>>  <property>
>>>>>    <name>dfs.replication</name>
>>>>>    <value>3</value>
>>>>>  </property>
>>>>>
>>>>> -Slave nodes:
>>>>> core-site.xml
>>>>>  <property>
>>>>>    <name>fs.default.name</name>
>>>>>    <value>hdfs://lab01:9000/</value>
>>>>>  </property>
>>>>>
>>>>> hdfs-site.xml
>>>>>  <property>
>>>>>    <name>dfs.replication</name>
>>>>>    <value>3</value>
>>>>>  </property>
>>>>>
>>>>> Do I missing any configuration? Or any place that I can check?
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>>
>> --
>> Harsh J
>>
>



-- 
Harsh J

Re: could only be replicated to 0 nodes, instead of 1

Reply via email to