Harsh, Thanks, you are right. The problem stems from the tmp directory space is not large enough. After changing tmp dir to other place, the problem goes away.
But I remember one block size (default) in hdfs is 64m, so shouldn't it at least allow one file, whose actual size in local disk is smaller than 1k, to be uploaded? Thanks again for the advice. On Fri, Jul 15, 2011 at 7:49 PM, Harsh J <ha...@cloudera.com> wrote: > Thomas, > > Your problem might lie simply with the virtual node DNs using /tmp and > tmpfs being used for that -- which somehow is causing reported free > space to go as 0 in reports to the NN (master). > > tmpfs 101M 44K 101M 1% /tmp > > This causes your trouble that the NN can't choose a suitable DN to > write to, cause it determines that none has at least a block size > worth of space (64MB default) available for writes. > > You can resolve as: > > 1. Stop DFS completely. > > 2. Create a directory under root somewhere (I use Cloudera's distro, > and its default configured location for data files comes along as > /var/lib/hadoop-0.20/cache/, if you need an idea for a location) and > set it as your hadoop.tmp.dir in core-site.xml on all the nodes. > > 3. Reformat your NameNode (hadoop namenode -format, say Y) and restart > DFS. Things _should_ be OK now. > > Config example (core-site.xml): > > <property> > <name>hadoop.tmp.dir</name> > <value>/var/lib/hadoop-0.20/cache</value> > </property> > > Let us know if this still doesn't get your dev cluster up and running > for action :) > > On Fri, Jul 15, 2011 at 4:40 PM, Thomas Anderson > <t.dt.aander...@gmail.com> wrote: >> When doing partition, I remember only / and swap was specified for all >> nodes during creation. So I think /tmp is also mounted under /, which >> should have size around 9G. The total size of hardisk specified is >> 10G. >> >> The df -kh shows >> >> server01: >> /dev/sda1 9.4G 2.3G 6.7G 25% / >> tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw >> tmpfs 5.0M 0 5.0M 0% /var/run/lock >> tmpfs 101M 132K 101M 1% /tmp >> udev 247M 0 247M 0% /dev >> tmpfs 101M 0 101M 0% /var/run/shm >> tmpfs 51M 176K 51M 1% /var/run >> >> server02: >> /dev/sda1 9.4G 2.2G 6.8G 25% / >> tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw >> tmpfs 5.0M 0 5.0M 0% /var/run/lock >> tmpfs 101M 44K 101M 1% /tmp >> udev 247M 0 247M 0% /dev >> tmpfs 101M 0 101M 0% /var/run/shm >> tmpfs 51M 176K 51M 1% /var/run >> >> server03: >> /dev/sda1 9.4G 2.2G 6.8G 25% / >> tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw >> tmpfs 5.0M 0 5.0M 0% /var/run/lock >> tmpfs 101M 44K 101M 1% /tmp >> udev 247M 0 247M 0% /dev >> tmpfs 101M 0 101M 0% /var/run/shm >> tmpfs 51M 176K 51M 1% /var/run >> >> server04: >> /dev/sda1 9.4G 2.2G 6.8G 25% / >> tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw >> tmpfs 5.0M 0 5.0M 0% /var/run/lock >> tmpfs 101M 44K 101M 1% /tmp >> udev 247M 0 247M 0% /dev >> tmpfs 101M 0 101M 0% /var/run/shm >> tmpfs 51M 176K 51M 1% /var/run >> >> server05: >> /dev/sda1 9.4G 2.2G 6.8G 25% / >> tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw >> tmpfs 5.0M 0 5.0M 0% /var/run/lock >> tmpfs 101M 44K 101M 1% /tmp >> udev 247M 0 247M 0% /dev >> tmpfs 101M 0 101M 0% /var/run/shm >> tmpfs 51M 176K 51M 1% /var/run >> >> In addition, the output of dfs (du -sk /tmp/hadoop-user/dfs) is >> >> server02: >> 8 /tmp/hadoop-user/dfs/ >> >> server03: >> 8 /tmp/hadoop-user/dfs/ >> >> server04: >> 8 /tmp/hadoop-user/dfs/ >> >> server05: >> 8 /tmp/hadoop-user/dfs/ >> >> On Fri, Jul 15, 2011 at 7:01 PM, Harsh J <ha...@cloudera.com> wrote: >>> (P.s. I asked that cause if you look at your NN's live nodes tables, >>> the reported space is all 0) >>> >>> What's the output of: >>> >>> du -sk /tmp/hadoop-user/dfs on all your DNs? >>> >>> On Fri, Jul 15, 2011 at 4:01 PM, Harsh J <ha...@cloudera.com> wrote: >>>> Thomas, >>>> >>>> Is your /tmp/ mount point also under the / or is it separate? Your >>>> dfs.data.dir are /tmp/hadoop-user/dfs/data in all DNs, and if they are >>>> separately mounted then what's the available space on that? >>>> >>>> (bad idea in production to keep things default on /tmp though, like >>>> dfs.name.dir, dfs.data.dir -- reconfigure+restart as necessary) >>>> >>>> On Fri, Jul 15, 2011 at 3:47 PM, Thomas Anderson >>>> <t.dt.aander...@gmail.com> wrote: >>>>> 1.) The disk usage (with df -kh) on namenode (server01) >>>>> >>>>> Filesystem Size Used Avail Use% Mounted on >>>>> /dev/sda1 9.4G 2.3G 6.7G 25% / >>>>> >>>>> and datanodes (server02 ~ server05) >>>>> /dev/sda1 9.4G 2.2G 6.8G 25% / >>>>> /dev/sda1 9.4G 2.2G 6.8G 25% / >>>>> /dev/sda1 9.4G 2.2G 6.8G 25% / >>>>> /dev/sda1 9.4G 2.2G 6.8G 25% / >>>>> >>>>> 2.) How can I make sure that datanode is busy? The environment is only >>>>> for testing so there is no other user processes are running at that >>>>> moment. Also it is a fresh installation, so only hadoop required >>>>> packages are installed such as hadoop and jdk. >>>>> >>>>> 3.) fs.block.size is not set in hdfs-site.xml, including datanodes and >>>>> namenode, because its purpose is for testing. I thought it would use >>>>> the default value, which should be 512? >>>>> >>>>> 4.) What might be a good way for fast check if network is not stable? >>>>> I check the healthy page e.g. server01:50070/dfshealth.jsp where >>>>> livenodes are up and last contact varies when checking the page. >>>>> >>>>> Node Last Contact Admin State Configured Capacity (GB) >>>>> Used >>>>> (GB) Non DFS Used (GB) Remaining (GB) Used (%) >>>>> Used (%) >>>>> Remaining (%) Blocks >>>>> server02 2 In Service 0.1 0 0 0.1 >>>>> 0.01 99.96 0 >>>>> server03 0 In Service 0.1 0 0 0.1 >>>>> 0.01 99.96 0 >>>>> server04 1 In Service 0.1 0 0 0.1 >>>>> 0.01 99.96 0 >>>>> server05 2 In Service 0.1 0 0 0.1 >>>>> 0.01 99.96 0 >>>>> >>>>> 5.) Only command `hadoop fs -put /tmp/testfile test` is issued as it >>>>> is just to test if the installation is working. So the file e.g. >>>>> testfile will be removed first (hadoop fs -rm test/testfile), then >>>>> upload again with hadoop put command. >>>>> >>>>> The logs are listed as below: >>>>> >>>>> namenode: >>>>> server01: http://pastebin.com/TLpDmmPx >>>>> >>>>> datanodes: >>>>> server02: http://pastebin.com/pdE5XKfi >>>>> server03: http://pastebin.com/4aV7ECCV >>>>> server04: http://pastebin.com/tF7HiRZj >>>>> server05: http://pastebin.com/5qwSPrvU >>>>> >>>>> Please let me know if more information needs to be provided. >>>>> >>>>> I really appreciate your suggestion. >>>>> >>>>> Thank you. >>>>> >>>>> >>>>> On Fri, Jul 15, 2011 at 4:54 PM, Brahma Reddy <brahmared...@huawei.com> >>>>> wrote: >>>>>> Hi, >>>>>> >>>>>> By seeing this exception(could only be replicated to 0 nodes, instead of >>>>>> 1) >>>>>> ,datanode is not available to Name Node.. >>>>>> >>>>>> This are the following cases Data Node may not available to Name Node >>>>>> >>>>>> 1)Data Node disk is Full >>>>>> >>>>>> 2)Data Node is Busy with block report and block scanning >>>>>> >>>>>> 3)If Block Size is Negative value(dfs.block.size in hdfs-site.xml) >>>>>> >>>>>> 4)while write in progress primary datanode goes down(Any n/w fluctations >>>>>> b/w >>>>>> Name Node and Data Node Machines) >>>>>> >>>>>> 5)when Ever we append any partial chunk and call sync for subsequent >>>>>> partial >>>>>> chunk appends client should store the previous data in buffer. >>>>>> >>>>>> For example after appending "a" I have called sync and when I am trying >>>>>> the >>>>>> to append the buffer should have "ab" >>>>>> >>>>>> And Server side when the chunk is not multiple of 512 then it will try >>>>>> to do >>>>>> Crc comparison for the data present in block file as well as crc present >>>>>> in >>>>>> metafile. But while constructing crc for the data present in block it is >>>>>> always comparing till the initial Offeset >>>>>> >>>>>> Or For more analysis Please the data node logs >>>>>> >>>>>> Warm Regards >>>>>> >>>>>> Brahma Reddy >>>>>> >>>>>> **************************************************************************** >>>>>> *********** >>>>>> This e-mail and attachments contain confidential information from HUAWEI, >>>>>> which is intended only for the person or entity whose address is listed >>>>>> above. Any use of the information contained herein in any way (including, >>>>>> but not limited to, total or partial disclosure, reproduction, or >>>>>> dissemination) by persons other than the intended recipient's) is >>>>>> prohibited. If you receive this e-mail in error, please notify the >>>>>> sender by >>>>>> phone or email immediately and delete it! >>>>>> -----Original Message----- >>>>>> From: Thomas Anderson [mailto:t.dt.aander...@gmail.com] >>>>>> Sent: Friday, July 15, 2011 9:09 AM >>>>>> To: hdfs-user@hadoop.apache.org >>>>>> Subject: could only be replicated to 0 nodes, instead of 1 >>>>>> >>>>>> I have fresh hadoop 0.20.2 installed on virtualbox 4.0.8 with jdk >>>>>> 1.6.0_26. The problem is when trying to put a file to hdfs, it throws >>>>>> error `org.apache.hadoop.ipc.RemoteException: java.io.IOException: >>>>>> File /path/to/file could only be replicated to 0 nodes, instead of 1'; >>>>>> however, there is no problem to create a folder, as the command ls >>>>>> print the result >>>>>> >>>>>> Found 1 items >>>>>> drwxr-xr-x - user supergroup 0 2011-07-15 11:09 >>>>>> /user/user/test >>>>>> >>>>>> I also try with flushing firewall (remove all iptables restriction), >>>>>> but the error message is still thrown out when uploading (hadoop fs >>>>>> -put /tmp/x test) a file from local fs. >>>>>> >>>>>> The name node log shows >>>>>> >>>>>> 2011-07-15 10:42:43,491 INFO org.apache.hadoop.hdfs.StateChange: >>>>>> BLOCK* NameSystem.registerDatanode: node registration from >>>>>> aaa.bbb.ccc.ddd.22:50010 storage DS-929017105-aaa.bbb.ccc.22-50010-13 >>>>>> 10697763488 >>>>>> 2011-07-15 10:42:43,495 INFO org.apache.hadoop.net.NetworkTopology: >>>>>> Adding a new node: /default-rack/aaa.bbb.ccc.22:50010 >>>>>> 2011-07-15 10:42:44,169 INFO org.apache.hadoop.hdfs.StateChange: >>>>>> BLOCK* NameSystem.registerDatanode: node registration from >>>>>> aaa.bbb.ccc.35:50010 storage DS-884574392-aaa.bbb.ccc.35-50010-13 >>>>>> 10697764164 >>>>>> 2011-07-15 10:42:44,170 INFO org.apache.hadoop.net.NetworkTopology: >>>>>> Adding a new node: /default-rack/aaa.bbb.ccc.35:50010 >>>>>> 2011-07-15 10:42:44,507 INFO org.apache.hadoop.hdfs.StateChange: >>>>>> BLOCK* NameSystem.registerDatanode: node registration from >>>>>> aaa.bbb.ccc.ddd.11:50010 storage DS-1537583073-aaa.bbb.ccc.11-50010-1 >>>>>> 310697764488 >>>>>> 2011-07-15 10:42:44,507 INFO org.apache.hadoop.net.NetworkTopology: >>>>>> Adding a new node: /default-rack/aaa.bbb.ccc.11:50010 >>>>>> 2011-07-15 10:42:45,796 INFO org.apache.hadoop.hdfs.StateChange: >>>>>> BLOCK* NameSystem.registerDatanode: node registration from >>>>>> 140.127.220.25:50010 storage DS-1500589162-aaa.bbb.ccc.25-50010-1 >>>>>> 310697765386 >>>>>> 2011-07-15 10:42:45,797 INFO org.apache.hadoop.net.NetworkTopology: >>>>>> Adding a new node: /default-rack/aaa.bbb.ccc.25:50010 >>>>>> >>>>>> And all datanodes have similar message as below: >>>>>> >>>>>> 2011-07-15 10:42:46,562 INFO >>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: using >>>>>> BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec >>>>>> 2011-07-15 10:42:47,163 INFO >>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 >>>>>> blocks got processed in 3 msecs >>>>>> 2011-07-15 10:42:47,187 INFO >>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic >>>>>> block scanner. >>>>>> 2011-07-15 11:19:42,931 INFO >>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 >>>>>> blocks got processed in 1 msecs >>>>>> >>>>>> Command `hadoop fsck /` displays >>>>>> >>>>>> Status: HEALTHY >>>>>> Total size: 0 B >>>>>> Total dirs: 3 >>>>>> Total files: 0 (Files currently being written: 1) >>>>>> Total blocks (validated): 0 >>>>>> Minimally replicated blocks: 0 >>>>>> Over-replicated blocks: 0 >>>>>> Under-replicated blocks: 0 >>>>>> Mis-replicated blocks: 0 >>>>>> Default replication factor: 3 >>>>>> Average block replication: 0.0 >>>>>> Corrupt blocks: 0 >>>>>> Missing replicas: 0 >>>>>> Number of data-nodes: 4 >>>>>> >>>>>> The setting in conf include: >>>>>> >>>>>> - Master node: >>>>>> core-site.xml >>>>>> <property> >>>>>> <name>fs.default.name</name> >>>>>> <value>hdfs://lab01:9000/</value> >>>>>> </property> >>>>>> >>>>>> hdfs-site.xml >>>>>> <property> >>>>>> <name>dfs.replication</name> >>>>>> <value>3</value> >>>>>> </property> >>>>>> >>>>>> -Slave nodes: >>>>>> core-site.xml >>>>>> <property> >>>>>> <name>fs.default.name</name> >>>>>> <value>hdfs://lab01:9000/</value> >>>>>> </property> >>>>>> >>>>>> hdfs-site.xml >>>>>> <property> >>>>>> <name>dfs.replication</name> >>>>>> <value>3</value> >>>>>> </property> >>>>>> >>>>>> Do I missing any configuration? Or any place that I can check? >>>>>> >>>>>> Thanks. >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Harsh J >>>> >>> >>> >>> >>> -- >>> Harsh J >>> >> > > > > -- > Harsh J >