Thomas, Is your /tmp/ mount point also under the / or is it separate? Your dfs.data.dir are /tmp/hadoop-user/dfs/data in all DNs, and if they are separately mounted then what's the available space on that?
(bad idea in production to keep things default on /tmp though, like dfs.name.dir, dfs.data.dir -- reconfigure+restart as necessary) On Fri, Jul 15, 2011 at 3:47 PM, Thomas Anderson <t.dt.aander...@gmail.com> wrote: > 1.) The disk usage (with df -kh) on namenode (server01) > > Filesystem Size Used Avail Use% Mounted on > /dev/sda1 9.4G 2.3G 6.7G 25% / > > and datanodes (server02 ~ server05) > /dev/sda1 9.4G 2.2G 6.8G 25% / > /dev/sda1 9.4G 2.2G 6.8G 25% / > /dev/sda1 9.4G 2.2G 6.8G 25% / > /dev/sda1 9.4G 2.2G 6.8G 25% / > > 2.) How can I make sure that datanode is busy? The environment is only > for testing so there is no other user processes are running at that > moment. Also it is a fresh installation, so only hadoop required > packages are installed such as hadoop and jdk. > > 3.) fs.block.size is not set in hdfs-site.xml, including datanodes and > namenode, because its purpose is for testing. I thought it would use > the default value, which should be 512? > > 4.) What might be a good way for fast check if network is not stable? > I check the healthy page e.g. server01:50070/dfshealth.jsp where > livenodes are up and last contact varies when checking the page. > > Node Last Contact Admin State Configured Capacity (GB) Used > (GB) Non DFS Used (GB) Remaining (GB) Used (%) Used > (%) > Remaining (%) Blocks > server02 2 In Service 0.1 0 0 0.1 0.01 > 99.96 0 > server03 0 In Service 0.1 0 0 0.1 0.01 > 99.96 0 > server04 1 In Service 0.1 0 0 0.1 0.01 > 99.96 0 > server05 2 In Service 0.1 0 0 0.1 0.01 > 99.96 0 > > 5.) Only command `hadoop fs -put /tmp/testfile test` is issued as it > is just to test if the installation is working. So the file e.g. > testfile will be removed first (hadoop fs -rm test/testfile), then > upload again with hadoop put command. > > The logs are listed as below: > > namenode: > server01: http://pastebin.com/TLpDmmPx > > datanodes: > server02: http://pastebin.com/pdE5XKfi > server03: http://pastebin.com/4aV7ECCV > server04: http://pastebin.com/tF7HiRZj > server05: http://pastebin.com/5qwSPrvU > > Please let me know if more information needs to be provided. > > I really appreciate your suggestion. > > Thank you. > > > On Fri, Jul 15, 2011 at 4:54 PM, Brahma Reddy <brahmared...@huawei.com> wrote: >> Hi, >> >> By seeing this exception(could only be replicated to 0 nodes, instead of 1) >> ,datanode is not available to Name Node.. >> >> This are the following cases Data Node may not available to Name Node >> >> 1)Data Node disk is Full >> >> 2)Data Node is Busy with block report and block scanning >> >> 3)If Block Size is Negative value(dfs.block.size in hdfs-site.xml) >> >> 4)while write in progress primary datanode goes down(Any n/w fluctations b/w >> Name Node and Data Node Machines) >> >> 5)when Ever we append any partial chunk and call sync for subsequent partial >> chunk appends client should store the previous data in buffer. >> >> For example after appending "a" I have called sync and when I am trying the >> to append the buffer should have "ab" >> >> And Server side when the chunk is not multiple of 512 then it will try to do >> Crc comparison for the data present in block file as well as crc present in >> metafile. But while constructing crc for the data present in block it is >> always comparing till the initial Offeset >> >> Or For more analysis Please the data node logs >> >> Warm Regards >> >> Brahma Reddy >> >> **************************************************************************** >> *********** >> This e-mail and attachments contain confidential information from HUAWEI, >> which is intended only for the person or entity whose address is listed >> above. Any use of the information contained herein in any way (including, >> but not limited to, total or partial disclosure, reproduction, or >> dissemination) by persons other than the intended recipient's) is >> prohibited. If you receive this e-mail in error, please notify the sender by >> phone or email immediately and delete it! >> -----Original Message----- >> From: Thomas Anderson [mailto:t.dt.aander...@gmail.com] >> Sent: Friday, July 15, 2011 9:09 AM >> To: hdfs-user@hadoop.apache.org >> Subject: could only be replicated to 0 nodes, instead of 1 >> >> I have fresh hadoop 0.20.2 installed on virtualbox 4.0.8 with jdk >> 1.6.0_26. The problem is when trying to put a file to hdfs, it throws >> error `org.apache.hadoop.ipc.RemoteException: java.io.IOException: >> File /path/to/file could only be replicated to 0 nodes, instead of 1'; >> however, there is no problem to create a folder, as the command ls >> print the result >> >> Found 1 items >> drwxr-xr-x - user supergroup 0 2011-07-15 11:09 /user/user/test >> >> I also try with flushing firewall (remove all iptables restriction), >> but the error message is still thrown out when uploading (hadoop fs >> -put /tmp/x test) a file from local fs. >> >> The name node log shows >> >> 2011-07-15 10:42:43,491 INFO org.apache.hadoop.hdfs.StateChange: >> BLOCK* NameSystem.registerDatanode: node registration from >> aaa.bbb.ccc.ddd.22:50010 storage DS-929017105-aaa.bbb.ccc.22-50010-13 >> 10697763488 >> 2011-07-15 10:42:43,495 INFO org.apache.hadoop.net.NetworkTopology: >> Adding a new node: /default-rack/aaa.bbb.ccc.22:50010 >> 2011-07-15 10:42:44,169 INFO org.apache.hadoop.hdfs.StateChange: >> BLOCK* NameSystem.registerDatanode: node registration from >> aaa.bbb.ccc.35:50010 storage DS-884574392-aaa.bbb.ccc.35-50010-13 >> 10697764164 >> 2011-07-15 10:42:44,170 INFO org.apache.hadoop.net.NetworkTopology: >> Adding a new node: /default-rack/aaa.bbb.ccc.35:50010 >> 2011-07-15 10:42:44,507 INFO org.apache.hadoop.hdfs.StateChange: >> BLOCK* NameSystem.registerDatanode: node registration from >> aaa.bbb.ccc.ddd.11:50010 storage DS-1537583073-aaa.bbb.ccc.11-50010-1 >> 310697764488 >> 2011-07-15 10:42:44,507 INFO org.apache.hadoop.net.NetworkTopology: >> Adding a new node: /default-rack/aaa.bbb.ccc.11:50010 >> 2011-07-15 10:42:45,796 INFO org.apache.hadoop.hdfs.StateChange: >> BLOCK* NameSystem.registerDatanode: node registration from >> 140.127.220.25:50010 storage DS-1500589162-aaa.bbb.ccc.25-50010-1 >> 310697765386 >> 2011-07-15 10:42:45,797 INFO org.apache.hadoop.net.NetworkTopology: >> Adding a new node: /default-rack/aaa.bbb.ccc.25:50010 >> >> And all datanodes have similar message as below: >> >> 2011-07-15 10:42:46,562 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: using >> BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec >> 2011-07-15 10:42:47,163 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 >> blocks got processed in 3 msecs >> 2011-07-15 10:42:47,187 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic >> block scanner. >> 2011-07-15 11:19:42,931 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 >> blocks got processed in 1 msecs >> >> Command `hadoop fsck /` displays >> >> Status: HEALTHY >> Total size: 0 B >> Total dirs: 3 >> Total files: 0 (Files currently being written: 1) >> Total blocks (validated): 0 >> Minimally replicated blocks: 0 >> Over-replicated blocks: 0 >> Under-replicated blocks: 0 >> Mis-replicated blocks: 0 >> Default replication factor: 3 >> Average block replication: 0.0 >> Corrupt blocks: 0 >> Missing replicas: 0 >> Number of data-nodes: 4 >> >> The setting in conf include: >> >> - Master node: >> core-site.xml >> <property> >> <name>fs.default.name</name> >> <value>hdfs://lab01:9000/</value> >> </property> >> >> hdfs-site.xml >> <property> >> <name>dfs.replication</name> >> <value>3</value> >> </property> >> >> -Slave nodes: >> core-site.xml >> <property> >> <name>fs.default.name</name> >> <value>hdfs://lab01:9000/</value> >> </property> >> >> hdfs-site.xml >> <property> >> <name>dfs.replication</name> >> <value>3</value> >> </property> >> >> Do I missing any configuration? Or any place that I can check? >> >> Thanks. >> >> > -- Harsh J