Thomas, Your problem might lie simply with the virtual node DNs using /tmp and tmpfs being used for that -- which somehow is causing reported free space to go as 0 in reports to the NN (master).
tmpfs 101M 44K 101M 1% /tmp This causes your trouble that the NN can't choose a suitable DN to write to, cause it determines that none has at least a block size worth of space (64MB default) available for writes. You can resolve as: 1. Stop DFS completely. 2. Create a directory under root somewhere (I use Cloudera's distro, and its default configured location for data files comes along as /var/lib/hadoop-0.20/cache/, if you need an idea for a location) and set it as your hadoop.tmp.dir in core-site.xml on all the nodes. 3. Reformat your NameNode (hadoop namenode -format, say Y) and restart DFS. Things _should_ be OK now. Config example (core-site.xml): <property> <name>hadoop.tmp.dir</name> <value>/var/lib/hadoop-0.20/cache</value> </property> Let us know if this still doesn't get your dev cluster up and running for action :) On Fri, Jul 15, 2011 at 4:40 PM, Thomas Anderson <t.dt.aander...@gmail.com> wrote: > When doing partition, I remember only / and swap was specified for all > nodes during creation. So I think /tmp is also mounted under /, which > should have size around 9G. The total size of hardisk specified is > 10G. > > The df -kh shows > > server01: > /dev/sda1 9.4G 2.3G 6.7G 25% / > tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw > tmpfs 5.0M 0 5.0M 0% /var/run/lock > tmpfs 101M 132K 101M 1% /tmp > udev 247M 0 247M 0% /dev > tmpfs 101M 0 101M 0% /var/run/shm > tmpfs 51M 176K 51M 1% /var/run > > server02: > /dev/sda1 9.4G 2.2G 6.8G 25% / > tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw > tmpfs 5.0M 0 5.0M 0% /var/run/lock > tmpfs 101M 44K 101M 1% /tmp > udev 247M 0 247M 0% /dev > tmpfs 101M 0 101M 0% /var/run/shm > tmpfs 51M 176K 51M 1% /var/run > > server03: > /dev/sda1 9.4G 2.2G 6.8G 25% / > tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw > tmpfs 5.0M 0 5.0M 0% /var/run/lock > tmpfs 101M 44K 101M 1% /tmp > udev 247M 0 247M 0% /dev > tmpfs 101M 0 101M 0% /var/run/shm > tmpfs 51M 176K 51M 1% /var/run > > server04: > /dev/sda1 9.4G 2.2G 6.8G 25% / > tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw > tmpfs 5.0M 0 5.0M 0% /var/run/lock > tmpfs 101M 44K 101M 1% /tmp > udev 247M 0 247M 0% /dev > tmpfs 101M 0 101M 0% /var/run/shm > tmpfs 51M 176K 51M 1% /var/run > > server05: > /dev/sda1 9.4G 2.2G 6.8G 25% / > tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw > tmpfs 5.0M 0 5.0M 0% /var/run/lock > tmpfs 101M 44K 101M 1% /tmp > udev 247M 0 247M 0% /dev > tmpfs 101M 0 101M 0% /var/run/shm > tmpfs 51M 176K 51M 1% /var/run > > In addition, the output of dfs (du -sk /tmp/hadoop-user/dfs) is > > server02: > 8 /tmp/hadoop-user/dfs/ > > server03: > 8 /tmp/hadoop-user/dfs/ > > server04: > 8 /tmp/hadoop-user/dfs/ > > server05: > 8 /tmp/hadoop-user/dfs/ > > On Fri, Jul 15, 2011 at 7:01 PM, Harsh J <ha...@cloudera.com> wrote: >> (P.s. I asked that cause if you look at your NN's live nodes tables, >> the reported space is all 0) >> >> What's the output of: >> >> du -sk /tmp/hadoop-user/dfs on all your DNs? >> >> On Fri, Jul 15, 2011 at 4:01 PM, Harsh J <ha...@cloudera.com> wrote: >>> Thomas, >>> >>> Is your /tmp/ mount point also under the / or is it separate? Your >>> dfs.data.dir are /tmp/hadoop-user/dfs/data in all DNs, and if they are >>> separately mounted then what's the available space on that? >>> >>> (bad idea in production to keep things default on /tmp though, like >>> dfs.name.dir, dfs.data.dir -- reconfigure+restart as necessary) >>> >>> On Fri, Jul 15, 2011 at 3:47 PM, Thomas Anderson >>> <t.dt.aander...@gmail.com> wrote: >>>> 1.) The disk usage (with df -kh) on namenode (server01) >>>> >>>> Filesystem Size Used Avail Use% Mounted on >>>> /dev/sda1 9.4G 2.3G 6.7G 25% / >>>> >>>> and datanodes (server02 ~ server05) >>>> /dev/sda1 9.4G 2.2G 6.8G 25% / >>>> /dev/sda1 9.4G 2.2G 6.8G 25% / >>>> /dev/sda1 9.4G 2.2G 6.8G 25% / >>>> /dev/sda1 9.4G 2.2G 6.8G 25% / >>>> >>>> 2.) How can I make sure that datanode is busy? The environment is only >>>> for testing so there is no other user processes are running at that >>>> moment. Also it is a fresh installation, so only hadoop required >>>> packages are installed such as hadoop and jdk. >>>> >>>> 3.) fs.block.size is not set in hdfs-site.xml, including datanodes and >>>> namenode, because its purpose is for testing. I thought it would use >>>> the default value, which should be 512? >>>> >>>> 4.) What might be a good way for fast check if network is not stable? >>>> I check the healthy page e.g. server01:50070/dfshealth.jsp where >>>> livenodes are up and last contact varies when checking the page. >>>> >>>> Node Last Contact Admin State Configured Capacity (GB) >>>> Used >>>> (GB) Non DFS Used (GB) Remaining (GB) Used (%) >>>> Used (%) >>>> Remaining (%) Blocks >>>> server02 2 In Service 0.1 0 0 0.1 >>>> 0.01 99.96 0 >>>> server03 0 In Service 0.1 0 0 0.1 >>>> 0.01 99.96 0 >>>> server04 1 In Service 0.1 0 0 0.1 >>>> 0.01 99.96 0 >>>> server05 2 In Service 0.1 0 0 0.1 >>>> 0.01 99.96 0 >>>> >>>> 5.) Only command `hadoop fs -put /tmp/testfile test` is issued as it >>>> is just to test if the installation is working. So the file e.g. >>>> testfile will be removed first (hadoop fs -rm test/testfile), then >>>> upload again with hadoop put command. >>>> >>>> The logs are listed as below: >>>> >>>> namenode: >>>> server01: http://pastebin.com/TLpDmmPx >>>> >>>> datanodes: >>>> server02: http://pastebin.com/pdE5XKfi >>>> server03: http://pastebin.com/4aV7ECCV >>>> server04: http://pastebin.com/tF7HiRZj >>>> server05: http://pastebin.com/5qwSPrvU >>>> >>>> Please let me know if more information needs to be provided. >>>> >>>> I really appreciate your suggestion. >>>> >>>> Thank you. >>>> >>>> >>>> On Fri, Jul 15, 2011 at 4:54 PM, Brahma Reddy <brahmared...@huawei.com> >>>> wrote: >>>>> Hi, >>>>> >>>>> By seeing this exception(could only be replicated to 0 nodes, instead of >>>>> 1) >>>>> ,datanode is not available to Name Node.. >>>>> >>>>> This are the following cases Data Node may not available to Name Node >>>>> >>>>> 1)Data Node disk is Full >>>>> >>>>> 2)Data Node is Busy with block report and block scanning >>>>> >>>>> 3)If Block Size is Negative value(dfs.block.size in hdfs-site.xml) >>>>> >>>>> 4)while write in progress primary datanode goes down(Any n/w fluctations >>>>> b/w >>>>> Name Node and Data Node Machines) >>>>> >>>>> 5)when Ever we append any partial chunk and call sync for subsequent >>>>> partial >>>>> chunk appends client should store the previous data in buffer. >>>>> >>>>> For example after appending "a" I have called sync and when I am trying >>>>> the >>>>> to append the buffer should have "ab" >>>>> >>>>> And Server side when the chunk is not multiple of 512 then it will try to >>>>> do >>>>> Crc comparison for the data present in block file as well as crc present >>>>> in >>>>> metafile. But while constructing crc for the data present in block it is >>>>> always comparing till the initial Offeset >>>>> >>>>> Or For more analysis Please the data node logs >>>>> >>>>> Warm Regards >>>>> >>>>> Brahma Reddy >>>>> >>>>> **************************************************************************** >>>>> *********** >>>>> This e-mail and attachments contain confidential information from HUAWEI, >>>>> which is intended only for the person or entity whose address is listed >>>>> above. Any use of the information contained herein in any way (including, >>>>> but not limited to, total or partial disclosure, reproduction, or >>>>> dissemination) by persons other than the intended recipient's) is >>>>> prohibited. If you receive this e-mail in error, please notify the sender >>>>> by >>>>> phone or email immediately and delete it! >>>>> -----Original Message----- >>>>> From: Thomas Anderson [mailto:t.dt.aander...@gmail.com] >>>>> Sent: Friday, July 15, 2011 9:09 AM >>>>> To: hdfs-user@hadoop.apache.org >>>>> Subject: could only be replicated to 0 nodes, instead of 1 >>>>> >>>>> I have fresh hadoop 0.20.2 installed on virtualbox 4.0.8 with jdk >>>>> 1.6.0_26. The problem is when trying to put a file to hdfs, it throws >>>>> error `org.apache.hadoop.ipc.RemoteException: java.io.IOException: >>>>> File /path/to/file could only be replicated to 0 nodes, instead of 1'; >>>>> however, there is no problem to create a folder, as the command ls >>>>> print the result >>>>> >>>>> Found 1 items >>>>> drwxr-xr-x - user supergroup 0 2011-07-15 11:09 /user/user/test >>>>> >>>>> I also try with flushing firewall (remove all iptables restriction), >>>>> but the error message is still thrown out when uploading (hadoop fs >>>>> -put /tmp/x test) a file from local fs. >>>>> >>>>> The name node log shows >>>>> >>>>> 2011-07-15 10:42:43,491 INFO org.apache.hadoop.hdfs.StateChange: >>>>> BLOCK* NameSystem.registerDatanode: node registration from >>>>> aaa.bbb.ccc.ddd.22:50010 storage DS-929017105-aaa.bbb.ccc.22-50010-13 >>>>> 10697763488 >>>>> 2011-07-15 10:42:43,495 INFO org.apache.hadoop.net.NetworkTopology: >>>>> Adding a new node: /default-rack/aaa.bbb.ccc.22:50010 >>>>> 2011-07-15 10:42:44,169 INFO org.apache.hadoop.hdfs.StateChange: >>>>> BLOCK* NameSystem.registerDatanode: node registration from >>>>> aaa.bbb.ccc.35:50010 storage DS-884574392-aaa.bbb.ccc.35-50010-13 >>>>> 10697764164 >>>>> 2011-07-15 10:42:44,170 INFO org.apache.hadoop.net.NetworkTopology: >>>>> Adding a new node: /default-rack/aaa.bbb.ccc.35:50010 >>>>> 2011-07-15 10:42:44,507 INFO org.apache.hadoop.hdfs.StateChange: >>>>> BLOCK* NameSystem.registerDatanode: node registration from >>>>> aaa.bbb.ccc.ddd.11:50010 storage DS-1537583073-aaa.bbb.ccc.11-50010-1 >>>>> 310697764488 >>>>> 2011-07-15 10:42:44,507 INFO org.apache.hadoop.net.NetworkTopology: >>>>> Adding a new node: /default-rack/aaa.bbb.ccc.11:50010 >>>>> 2011-07-15 10:42:45,796 INFO org.apache.hadoop.hdfs.StateChange: >>>>> BLOCK* NameSystem.registerDatanode: node registration from >>>>> 140.127.220.25:50010 storage DS-1500589162-aaa.bbb.ccc.25-50010-1 >>>>> 310697765386 >>>>> 2011-07-15 10:42:45,797 INFO org.apache.hadoop.net.NetworkTopology: >>>>> Adding a new node: /default-rack/aaa.bbb.ccc.25:50010 >>>>> >>>>> And all datanodes have similar message as below: >>>>> >>>>> 2011-07-15 10:42:46,562 INFO >>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: using >>>>> BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec >>>>> 2011-07-15 10:42:47,163 INFO >>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 >>>>> blocks got processed in 3 msecs >>>>> 2011-07-15 10:42:47,187 INFO >>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic >>>>> block scanner. >>>>> 2011-07-15 11:19:42,931 INFO >>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 >>>>> blocks got processed in 1 msecs >>>>> >>>>> Command `hadoop fsck /` displays >>>>> >>>>> Status: HEALTHY >>>>> Total size: 0 B >>>>> Total dirs: 3 >>>>> Total files: 0 (Files currently being written: 1) >>>>> Total blocks (validated): 0 >>>>> Minimally replicated blocks: 0 >>>>> Over-replicated blocks: 0 >>>>> Under-replicated blocks: 0 >>>>> Mis-replicated blocks: 0 >>>>> Default replication factor: 3 >>>>> Average block replication: 0.0 >>>>> Corrupt blocks: 0 >>>>> Missing replicas: 0 >>>>> Number of data-nodes: 4 >>>>> >>>>> The setting in conf include: >>>>> >>>>> - Master node: >>>>> core-site.xml >>>>> <property> >>>>> <name>fs.default.name</name> >>>>> <value>hdfs://lab01:9000/</value> >>>>> </property> >>>>> >>>>> hdfs-site.xml >>>>> <property> >>>>> <name>dfs.replication</name> >>>>> <value>3</value> >>>>> </property> >>>>> >>>>> -Slave nodes: >>>>> core-site.xml >>>>> <property> >>>>> <name>fs.default.name</name> >>>>> <value>hdfs://lab01:9000/</value> >>>>> </property> >>>>> >>>>> hdfs-site.xml >>>>> <property> >>>>> <name>dfs.replication</name> >>>>> <value>3</value> >>>>> </property> >>>>> >>>>> Do I missing any configuration? Or any place that I can check? >>>>> >>>>> Thanks. >>>>> >>>>> >>>> >>> >>> >>> >>> -- >>> Harsh J >>> >> >> >> >> -- >> Harsh J >> > -- Harsh J