Re: could only be replicated to 0 nodes, instead of 1

Thomas Anderson Sat, 16 Jul 2011 01:26:13 -0700

Harsh,

Thanks, you are right. The problem stems from the tmp directory space
is not large enough. After changing tmp dir to other place, the
problem goes away.


But I remember one block size (default) in hdfs is 64m, so shouldn't
it at least allow one file, whose actual size in local disk is smaller
than 1k, to be uploaded?

Thanks again for the advice.

On Fri, Jul 15, 2011 at 7:49 PM, Harsh J <ha...@cloudera.com> wrote:
> Thomas,
>
> Your problem might lie simply with the virtual node DNs using /tmp and
> tmpfs being used for that -- which somehow is causing reported free
> space to go as 0 in reports to the NN (master).
>
> tmpfs                 101M   44K  101M   1% /tmp
>
> This causes your trouble that the NN can't choose a suitable DN to
> write to, cause it determines that none has at least a block size
> worth of space (64MB default) available for writes.
>
> You can resolve as:
>
> 1. Stop DFS completely.
>
> 2. Create a directory under root somewhere (I use Cloudera's distro,
> and its default configured location for data files comes along as
> /var/lib/hadoop-0.20/cache/, if you need an idea for a location) and
> set it as your hadoop.tmp.dir in core-site.xml on all the nodes.
>
> 3. Reformat your NameNode (hadoop namenode -format, say Y) and restart
> DFS. Things _should_ be OK now.
>
> Config example (core-site.xml):
>
>  <property>
>   <name>hadoop.tmp.dir</name>
>   <value>/var/lib/hadoop-0.20/cache</value>
>  </property>
>
> Let us know if this still doesn't get your dev cluster up and running
> for action :)
>
> On Fri, Jul 15, 2011 at 4:40 PM, Thomas Anderson
> <t.dt.aander...@gmail.com> wrote:
>> When doing partition, I remember only / and swap was specified for all
>> nodes during creation. So I think /tmp is also mounted under /, which
>> should have size around 9G. The total size of hardisk specified is
>> 10G.
>>
>> The df -kh shows
>>
>> server01:
>> /dev/sda1             9.4G  2.3G  6.7G  25% /
>> tmpfs                 5.0M  4.0K  5.0M   1% /lib/init/rw
>> tmpfs                 5.0M     0  5.0M   0% /var/run/lock
>> tmpfs                 101M  132K  101M   1% /tmp
>> udev                  247M     0  247M   0% /dev
>> tmpfs                 101M     0  101M   0% /var/run/shm
>> tmpfs                  51M  176K   51M   1% /var/run
>>
>> server02:
>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>> tmpfs                 5.0M  4.0K  5.0M   1% /lib/init/rw
>> tmpfs                 5.0M     0  5.0M   0% /var/run/lock
>> tmpfs                 101M   44K  101M   1% /tmp
>> udev                  247M     0  247M   0% /dev
>> tmpfs                 101M     0  101M   0% /var/run/shm
>> tmpfs                  51M  176K   51M   1% /var/run
>>
>> server03:
>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>> tmpfs                 5.0M  4.0K  5.0M   1% /lib/init/rw
>> tmpfs                 5.0M     0  5.0M   0% /var/run/lock
>> tmpfs                 101M   44K  101M   1% /tmp
>> udev                  247M     0  247M   0% /dev
>> tmpfs                 101M     0  101M   0% /var/run/shm
>> tmpfs                  51M  176K   51M   1% /var/run
>>
>> server04:
>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>> tmpfs                 5.0M  4.0K  5.0M   1% /lib/init/rw
>> tmpfs                 5.0M     0  5.0M   0% /var/run/lock
>> tmpfs                 101M   44K  101M   1% /tmp
>> udev                  247M     0  247M   0% /dev
>> tmpfs                 101M     0  101M   0% /var/run/shm
>> tmpfs                  51M  176K   51M   1% /var/run
>>
>> server05:
>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>> tmpfs                 5.0M  4.0K  5.0M   1% /lib/init/rw
>> tmpfs                 5.0M     0  5.0M   0% /var/run/lock
>> tmpfs                 101M   44K  101M   1% /tmp
>> udev                  247M     0  247M   0% /dev
>> tmpfs                 101M     0  101M   0% /var/run/shm
>> tmpfs                  51M  176K   51M   1% /var/run
>>
>> In addition, the output of dfs (du -sk /tmp/hadoop-user/dfs) is
>>
>> server02:
>> 8       /tmp/hadoop-user/dfs/
>>
>> server03:
>> 8       /tmp/hadoop-user/dfs/
>>
>> server04:
>> 8       /tmp/hadoop-user/dfs/
>>
>> server05:
>> 8       /tmp/hadoop-user/dfs/
>>
>> On Fri, Jul 15, 2011 at 7:01 PM, Harsh J <ha...@cloudera.com> wrote:
>>> (P.s. I asked that cause if you look at your NN's live nodes tables,
>>> the reported space is all 0)
>>>
>>> What's the output of:
>>>
>>> du -sk /tmp/hadoop-user/dfs on all your DNs?
>>>
>>> On Fri, Jul 15, 2011 at 4:01 PM, Harsh J <ha...@cloudera.com> wrote:
>>>> Thomas,
>>>>
>>>> Is your /tmp/ mount point also under the / or is it separate? Your
>>>> dfs.data.dir are /tmp/hadoop-user/dfs/data in all DNs, and if they are
>>>> separately mounted then what's the available space on that?
>>>>
>>>> (bad idea in production to keep things default on /tmp though, like
>>>> dfs.name.dir, dfs.data.dir -- reconfigure+restart as necessary)
>>>>
>>>> On Fri, Jul 15, 2011 at 3:47 PM, Thomas Anderson
>>>> <t.dt.aander...@gmail.com> wrote:
>>>>> 1.) The disk usage (with df -kh) on namenode (server01)
>>>>>
>>>>> Filesystem            Size  Used Avail Use% Mounted on
>>>>> /dev/sda1             9.4G  2.3G  6.7G  25% /
>>>>>
>>>>> and datanodes (server02 ~ server05)
>>>>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>>>>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>>>>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>>>>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>>>>>
>>>>> 2.) How can I make sure that datanode is busy? The environment is only
>>>>> for testing so there is no other user processes are running at that
>>>>> moment. Also it is a fresh installation, so only hadoop required
>>>>> packages are installed such as hadoop and jdk.
>>>>>
>>>>> 3.) fs.block.size is not set in hdfs-site.xml, including datanodes and
>>>>> namenode, because its purpose is for testing. I thought it would use
>>>>> the default value, which should be 512?
>>>>>
>>>>> 4.) What might be a good way for fast check if network is not stable?
>>>>> I check the healthy page e.g. server01:50070/dfshealth.jsp where
>>>>> livenodes are up and  last contact varies when checking the page.
>>>>>
>>>>> Node     Last Contact    Admin State     Configured  Capacity (GB)       
>>>>> Used
>>>>> (GB)     Non DFS  Used (GB)      Remaining  (GB)         Used  (%)       
>>>>> Used  (%)
>>>>> Remaining  (%)   Blocks
>>>>> server02         2      In Service      0.1     0       0       0.1     
>>>>> 0.01     99.96  0
>>>>> server03         0      In Service      0.1     0       0       0.1     
>>>>> 0.01     99.96  0
>>>>> server04         1      In Service      0.1     0       0       0.1     
>>>>> 0.01     99.96  0
>>>>> server05         2      In Service      0.1     0       0       0.1     
>>>>> 0.01     99.96  0
>>>>>
>>>>> 5.) Only command `hadoop fs -put /tmp/testfile test` is issued as it
>>>>> is just to test if the installation is working. So the file e.g.
>>>>> testfile will be removed first (hadoop fs -rm test/testfile), then
>>>>> upload again with hadoop put command.
>>>>>
>>>>> The logs are listed as below:
>>>>>
>>>>> namenode:
>>>>> server01: http://pastebin.com/TLpDmmPx
>>>>>
>>>>> datanodes:
>>>>> server02: http://pastebin.com/pdE5XKfi
>>>>> server03: http://pastebin.com/4aV7ECCV
>>>>> server04: http://pastebin.com/tF7HiRZj
>>>>> server05: http://pastebin.com/5qwSPrvU
>>>>>
>>>>> Please let me know if more information needs to be provided.
>>>>>
>>>>> I really appreciate your suggestion.
>>>>>
>>>>> Thank you.
>>>>>
>>>>>
>>>>> On Fri, Jul 15, 2011 at 4:54 PM, Brahma Reddy <brahmared...@huawei.com> 
>>>>> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> By seeing this exception(could only be replicated to 0 nodes, instead of 
>>>>>> 1)
>>>>>> ,datanode is not available to Name Node..
>>>>>>
>>>>>> This are the following cases Data Node may not available to Name Node
>>>>>>
>>>>>> 1)Data Node disk is Full
>>>>>>
>>>>>> 2)Data Node is Busy with block report and block scanning
>>>>>>
>>>>>> 3)If Block Size is Negative value(dfs.block.size in hdfs-site.xml)
>>>>>>
>>>>>> 4)while write in progress primary datanode goes down(Any n/w fluctations 
>>>>>> b/w
>>>>>> Name Node and Data Node Machines)
>>>>>>
>>>>>> 5)when Ever we append any partial chunk and call sync for subsequent 
>>>>>> partial
>>>>>> chunk appends client should store the previous data in buffer.
>>>>>>
>>>>>> For example after appending "a" I have called sync and when I am trying 
>>>>>> the
>>>>>> to append the buffer should have "ab"
>>>>>>
>>>>>> And Server side when the chunk is not multiple of 512 then it will try 
>>>>>> to do
>>>>>> Crc comparison for the data present in block file as well as crc present 
>>>>>> in
>>>>>> metafile. But while constructing crc for the data present in block it is
>>>>>> always comparing till the initial Offeset
>>>>>>
>>>>>> Or For more analysis Please the data node logs
>>>>>>
>>>>>> Warm Regards
>>>>>>
>>>>>> Brahma Reddy
>>>>>>
>>>>>> ****************************************************************************
>>>>>> ***********
>>>>>> This e-mail and attachments contain confidential information from HUAWEI,
>>>>>> which is intended only for the person or entity whose address is listed
>>>>>> above. Any use of the information contained herein in any way (including,
>>>>>> but not limited to, total or partial disclosure, reproduction, or
>>>>>> dissemination) by persons other than the intended recipient's) is
>>>>>> prohibited. If you receive this e-mail in error, please notify the 
>>>>>> sender by
>>>>>> phone or email immediately and delete it!
>>>>>> -----Original Message-----
>>>>>> From: Thomas Anderson [mailto:t.dt.aander...@gmail.com]
>>>>>> Sent: Friday, July 15, 2011 9:09 AM
>>>>>> To: hdfs-user@hadoop.apache.org
>>>>>> Subject: could only be replicated to 0 nodes, instead of 1
>>>>>>
>>>>>> I have fresh  hadoop 0.20.2 installed on virtualbox 4.0.8 with jdk
>>>>>> 1.6.0_26. The problem is when trying to put a file to hdfs, it throws
>>>>>> error `org.apache.hadoop.ipc.RemoteException: java.io.IOException:
>>>>>> File /path/to/file could only be replicated to 0 nodes, instead of 1';
>>>>>> however, there is no problem to create a folder, as the command ls
>>>>>> print the result
>>>>>>
>>>>>> Found 1 items
>>>>>> drwxr-xr-x   - user supergroup          0 2011-07-15 11:09 
>>>>>> /user/user/test
>>>>>>
>>>>>> I also try with flushing firewall (remove all iptables restriction),
>>>>>> but the error message is still thrown out when uploading (hadoop fs
>>>>>> -put /tmp/x test) a file from local fs.
>>>>>>
>>>>>> The name node log shows
>>>>>>
>>>>>> 2011-07-15 10:42:43,491 INFO org.apache.hadoop.hdfs.StateChange:
>>>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>>>> aaa.bbb.ccc.ddd.22:50010 storage DS-929017105-aaa.bbb.ccc.22-50010-13
>>>>>> 10697763488
>>>>>> 2011-07-15 10:42:43,495 INFO org.apache.hadoop.net.NetworkTopology:
>>>>>> Adding a new node: /default-rack/aaa.bbb.ccc.22:50010
>>>>>> 2011-07-15 10:42:44,169 INFO org.apache.hadoop.hdfs.StateChange:
>>>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>>>> aaa.bbb.ccc.35:50010 storage DS-884574392-aaa.bbb.ccc.35-50010-13
>>>>>> 10697764164
>>>>>> 2011-07-15 10:42:44,170 INFO org.apache.hadoop.net.NetworkTopology:
>>>>>> Adding a new node: /default-rack/aaa.bbb.ccc.35:50010
>>>>>> 2011-07-15 10:42:44,507 INFO org.apache.hadoop.hdfs.StateChange:
>>>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>>>> aaa.bbb.ccc.ddd.11:50010 storage DS-1537583073-aaa.bbb.ccc.11-50010-1
>>>>>> 310697764488
>>>>>> 2011-07-15 10:42:44,507 INFO org.apache.hadoop.net.NetworkTopology:
>>>>>> Adding a new node: /default-rack/aaa.bbb.ccc.11:50010
>>>>>> 2011-07-15 10:42:45,796 INFO org.apache.hadoop.hdfs.StateChange:
>>>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>>>> 140.127.220.25:50010 storage DS-1500589162-aaa.bbb.ccc.25-50010-1
>>>>>> 310697765386
>>>>>> 2011-07-15 10:42:45,797 INFO org.apache.hadoop.net.NetworkTopology:
>>>>>> Adding a new node: /default-rack/aaa.bbb.ccc.25:50010
>>>>>>
>>>>>> And all datanodes have similar message as below:
>>>>>>
>>>>>> 2011-07-15 10:42:46,562 INFO
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: using
>>>>>> BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec
>>>>>> 2011-07-15 10:42:47,163 INFO
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0
>>>>>> blocks got processed in 3 msecs
>>>>>> 2011-07-15 10:42:47,187 INFO
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic
>>>>>> block scanner.
>>>>>> 2011-07-15 11:19:42,931 INFO
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0
>>>>>> blocks got processed in 1 msecs
>>>>>>
>>>>>> Command `hadoop fsck /`  displays
>>>>>>
>>>>>> Status: HEALTHY
>>>>>>  Total size:    0 B
>>>>>>  Total dirs:    3
>>>>>>  Total files:   0 (Files currently being written: 1)
>>>>>>  Total blocks (validated):      0
>>>>>>  Minimally replicated blocks:   0
>>>>>>  Over-replicated blocks:        0
>>>>>>  Under-replicated blocks:       0
>>>>>>  Mis-replicated blocks:         0
>>>>>>  Default replication factor:    3
>>>>>>  Average block replication:     0.0
>>>>>>  Corrupt blocks:                0
>>>>>>  Missing replicas:              0
>>>>>>  Number of data-nodes:          4
>>>>>>
>>>>>> The setting in conf include:
>>>>>>
>>>>>> - Master node:
>>>>>> core-site.xml
>>>>>>  <property>
>>>>>>    <name>fs.default.name</name>
>>>>>>    <value>hdfs://lab01:9000/</value>
>>>>>>  </property>
>>>>>>
>>>>>> hdfs-site.xml
>>>>>>  <property>
>>>>>>    <name>dfs.replication</name>
>>>>>>    <value>3</value>
>>>>>>  </property>
>>>>>>
>>>>>> -Slave nodes:
>>>>>> core-site.xml
>>>>>>  <property>
>>>>>>    <name>fs.default.name</name>
>>>>>>    <value>hdfs://lab01:9000/</value>
>>>>>>  </property>
>>>>>>
>>>>>> hdfs-site.xml
>>>>>>  <property>
>>>>>>    <name>dfs.replication</name>
>>>>>>    <value>3</value>
>>>>>>  </property>
>>>>>>
>>>>>> Do I missing any configuration? Or any place that I can check?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>
>
>
> --
> Harsh J
>

Re: could only be replicated to 0 nodes, instead of 1

Reply via email to