(P.s. I asked that cause if you look at your NN's live nodes tables, the reported space is all 0)
What's the output of: du -sk /tmp/hadoop-user/dfs on all your DNs? On Fri, Jul 15, 2011 at 4:01 PM, Harsh J <ha...@cloudera.com> wrote: > Thomas, > > Is your /tmp/ mount point also under the / or is it separate? Your > dfs.data.dir are /tmp/hadoop-user/dfs/data in all DNs, and if they are > separately mounted then what's the available space on that? > > (bad idea in production to keep things default on /tmp though, like > dfs.name.dir, dfs.data.dir -- reconfigure+restart as necessary) > > On Fri, Jul 15, 2011 at 3:47 PM, Thomas Anderson > <t.dt.aander...@gmail.com> wrote: >> 1.) The disk usage (with df -kh) on namenode (server01) >> >> Filesystem Size Used Avail Use% Mounted on >> /dev/sda1 9.4G 2.3G 6.7G 25% / >> >> and datanodes (server02 ~ server05) >> /dev/sda1 9.4G 2.2G 6.8G 25% / >> /dev/sda1 9.4G 2.2G 6.8G 25% / >> /dev/sda1 9.4G 2.2G 6.8G 25% / >> /dev/sda1 9.4G 2.2G 6.8G 25% / >> >> 2.) How can I make sure that datanode is busy? The environment is only >> for testing so there is no other user processes are running at that >> moment. Also it is a fresh installation, so only hadoop required >> packages are installed such as hadoop and jdk. >> >> 3.) fs.block.size is not set in hdfs-site.xml, including datanodes and >> namenode, because its purpose is for testing. I thought it would use >> the default value, which should be 512? >> >> 4.) What might be a good way for fast check if network is not stable? >> I check the healthy page e.g. server01:50070/dfshealth.jsp where >> livenodes are up and last contact varies when checking the page. >> >> Node Last Contact Admin State Configured Capacity (GB) Used >> (GB) Non DFS Used (GB) Remaining (GB) Used (%) >> Used (%) >> Remaining (%) Blocks >> server02 2 In Service 0.1 0 0 0.1 0.01 >> 99.96 0 >> server03 0 In Service 0.1 0 0 0.1 0.01 >> 99.96 0 >> server04 1 In Service 0.1 0 0 0.1 0.01 >> 99.96 0 >> server05 2 In Service 0.1 0 0 0.1 0.01 >> 99.96 0 >> >> 5.) Only command `hadoop fs -put /tmp/testfile test` is issued as it >> is just to test if the installation is working. So the file e.g. >> testfile will be removed first (hadoop fs -rm test/testfile), then >> upload again with hadoop put command. >> >> The logs are listed as below: >> >> namenode: >> server01: http://pastebin.com/TLpDmmPx >> >> datanodes: >> server02: http://pastebin.com/pdE5XKfi >> server03: http://pastebin.com/4aV7ECCV >> server04: http://pastebin.com/tF7HiRZj >> server05: http://pastebin.com/5qwSPrvU >> >> Please let me know if more information needs to be provided. >> >> I really appreciate your suggestion. >> >> Thank you. >> >> >> On Fri, Jul 15, 2011 at 4:54 PM, Brahma Reddy <brahmared...@huawei.com> >> wrote: >>> Hi, >>> >>> By seeing this exception(could only be replicated to 0 nodes, instead of 1) >>> ,datanode is not available to Name Node.. >>> >>> This are the following cases Data Node may not available to Name Node >>> >>> 1)Data Node disk is Full >>> >>> 2)Data Node is Busy with block report and block scanning >>> >>> 3)If Block Size is Negative value(dfs.block.size in hdfs-site.xml) >>> >>> 4)while write in progress primary datanode goes down(Any n/w fluctations b/w >>> Name Node and Data Node Machines) >>> >>> 5)when Ever we append any partial chunk and call sync for subsequent partial >>> chunk appends client should store the previous data in buffer. >>> >>> For example after appending "a" I have called sync and when I am trying the >>> to append the buffer should have "ab" >>> >>> And Server side when the chunk is not multiple of 512 then it will try to do >>> Crc comparison for the data present in block file as well as crc present in >>> metafile. But while constructing crc for the data present in block it is >>> always comparing till the initial Offeset >>> >>> Or For more analysis Please the data node logs >>> >>> Warm Regards >>> >>> Brahma Reddy >>> >>> **************************************************************************** >>> *********** >>> This e-mail and attachments contain confidential information from HUAWEI, >>> which is intended only for the person or entity whose address is listed >>> above. Any use of the information contained herein in any way (including, >>> but not limited to, total or partial disclosure, reproduction, or >>> dissemination) by persons other than the intended recipient's) is >>> prohibited. If you receive this e-mail in error, please notify the sender by >>> phone or email immediately and delete it! >>> -----Original Message----- >>> From: Thomas Anderson [mailto:t.dt.aander...@gmail.com] >>> Sent: Friday, July 15, 2011 9:09 AM >>> To: hdfs-user@hadoop.apache.org >>> Subject: could only be replicated to 0 nodes, instead of 1 >>> >>> I have fresh hadoop 0.20.2 installed on virtualbox 4.0.8 with jdk >>> 1.6.0_26. The problem is when trying to put a file to hdfs, it throws >>> error `org.apache.hadoop.ipc.RemoteException: java.io.IOException: >>> File /path/to/file could only be replicated to 0 nodes, instead of 1'; >>> however, there is no problem to create a folder, as the command ls >>> print the result >>> >>> Found 1 items >>> drwxr-xr-x - user supergroup 0 2011-07-15 11:09 /user/user/test >>> >>> I also try with flushing firewall (remove all iptables restriction), >>> but the error message is still thrown out when uploading (hadoop fs >>> -put /tmp/x test) a file from local fs. >>> >>> The name node log shows >>> >>> 2011-07-15 10:42:43,491 INFO org.apache.hadoop.hdfs.StateChange: >>> BLOCK* NameSystem.registerDatanode: node registration from >>> aaa.bbb.ccc.ddd.22:50010 storage DS-929017105-aaa.bbb.ccc.22-50010-13 >>> 10697763488 >>> 2011-07-15 10:42:43,495 INFO org.apache.hadoop.net.NetworkTopology: >>> Adding a new node: /default-rack/aaa.bbb.ccc.22:50010 >>> 2011-07-15 10:42:44,169 INFO org.apache.hadoop.hdfs.StateChange: >>> BLOCK* NameSystem.registerDatanode: node registration from >>> aaa.bbb.ccc.35:50010 storage DS-884574392-aaa.bbb.ccc.35-50010-13 >>> 10697764164 >>> 2011-07-15 10:42:44,170 INFO org.apache.hadoop.net.NetworkTopology: >>> Adding a new node: /default-rack/aaa.bbb.ccc.35:50010 >>> 2011-07-15 10:42:44,507 INFO org.apache.hadoop.hdfs.StateChange: >>> BLOCK* NameSystem.registerDatanode: node registration from >>> aaa.bbb.ccc.ddd.11:50010 storage DS-1537583073-aaa.bbb.ccc.11-50010-1 >>> 310697764488 >>> 2011-07-15 10:42:44,507 INFO org.apache.hadoop.net.NetworkTopology: >>> Adding a new node: /default-rack/aaa.bbb.ccc.11:50010 >>> 2011-07-15 10:42:45,796 INFO org.apache.hadoop.hdfs.StateChange: >>> BLOCK* NameSystem.registerDatanode: node registration from >>> 140.127.220.25:50010 storage DS-1500589162-aaa.bbb.ccc.25-50010-1 >>> 310697765386 >>> 2011-07-15 10:42:45,797 INFO org.apache.hadoop.net.NetworkTopology: >>> Adding a new node: /default-rack/aaa.bbb.ccc.25:50010 >>> >>> And all datanodes have similar message as below: >>> >>> 2011-07-15 10:42:46,562 INFO >>> org.apache.hadoop.hdfs.server.datanode.DataNode: using >>> BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec >>> 2011-07-15 10:42:47,163 INFO >>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 >>> blocks got processed in 3 msecs >>> 2011-07-15 10:42:47,187 INFO >>> org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic >>> block scanner. >>> 2011-07-15 11:19:42,931 INFO >>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 >>> blocks got processed in 1 msecs >>> >>> Command `hadoop fsck /` displays >>> >>> Status: HEALTHY >>> Total size: 0 B >>> Total dirs: 3 >>> Total files: 0 (Files currently being written: 1) >>> Total blocks (validated): 0 >>> Minimally replicated blocks: 0 >>> Over-replicated blocks: 0 >>> Under-replicated blocks: 0 >>> Mis-replicated blocks: 0 >>> Default replication factor: 3 >>> Average block replication: 0.0 >>> Corrupt blocks: 0 >>> Missing replicas: 0 >>> Number of data-nodes: 4 >>> >>> The setting in conf include: >>> >>> - Master node: >>> core-site.xml >>> <property> >>> <name>fs.default.name</name> >>> <value>hdfs://lab01:9000/</value> >>> </property> >>> >>> hdfs-site.xml >>> <property> >>> <name>dfs.replication</name> >>> <value>3</value> >>> </property> >>> >>> -Slave nodes: >>> core-site.xml >>> <property> >>> <name>fs.default.name</name> >>> <value>hdfs://lab01:9000/</value> >>> </property> >>> >>> hdfs-site.xml >>> <property> >>> <name>dfs.replication</name> >>> <value>3</value> >>> </property> >>> >>> Do I missing any configuration? Or any place that I can check? >>> >>> Thanks. >>> >>> >> > > > > -- > Harsh J > -- Harsh J