Ok it seems I have the file system corrupted. How can I recover from this bin/hadoop fsck / .... /tmp/hadoop-aolias/mapred/system/job_200803241610_0001/job.jar: Under replicated blk_4445907956276011533. Target Replicas is 10 but found 7 replica(s). ........... /user/aolias/IDT/tm/GASS.0011.98100-0011.98200.zip: MISSING 1 blocks of total size 14684276 B. Status: CORRUPT Total size: 16621314 B Total dirs: 13 Total files: 15 Total blocks: 5 (avg. block size 3324262 B) ******************************** CORRUPT FILES: 1 MISSING BLOCKS: 1 MISSING SIZE: 14684276 B ******************************** Minimally replicated blocks: 4 (80.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 1 (20.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 2.6 Missing replicas: 3 (23.076923 %) Number of data-nodes: 13 Number of racks: 1
The filesystem under path '/' is CORRUPT On 24/03/2008, Alfonso Olias Sanz <[EMAIL PROTECTED]> wrote: > Hi Ted > Thanks for the info. But running the distfs I got this exception > > bin/hadoop distcp -update > "file:///home2/mtlinden/simdata/GASS-RDS-3-G/tm" "/user/aolias/IDT" > > With failures, global counters are inaccurate; consider running with -i > Copy failed: org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.dfs.SafeModeException: Cannot create > file/tmp/hadoop-aolias/mapred/system/distcp_idcrwx/_distcp_src_files. > Name node is in safe mode. > Safe mode will be turned off automatically. > at > org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:945) > at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:929) > at org.apache.hadoop.dfs.NameNode.create(NameNode.java:280) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:910) > > at org.apache.hadoop.ipc.Client.call(Client.java:512) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198) > at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source) > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:1928) > at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:382) > at > org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:123) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:436) > at > org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:827) > at > org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:379) > at > org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:270) > at org.apache.hadoop.util.CopyFiles.setup(CopyFiles.java:686) > at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:475) > at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:550) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:563) > > > > On 24/03/2008, Ted Dunning <[EMAIL PROTECTED]> wrote: > > > > > > Copy from a machine that is *not* running as a data node in order to get > > better balancing. Using distcp may also help because the nodes actually > > doing the copying will be spread across the cluster. > > > > You should probably be running a rebalancing script as well if your nodes > > have differing sizes. > > > > > > On 3/24/08 7:35 AM, "Alfonso Olias Sanz" <[EMAIL PROTECTED]> > > wrote: > > > > > > > Hi > > > > > > I want to copy 1000 files (37GB) of data to the dfs. I have a set up > > > of 9-10 nodes, each one has between 5 to 15GB of free space. > > > > > > While coping the files from the local file system on nodeA, the node > > > gets full of data and the the process gets stalled. > > > > > > I have another free node with 80GB of free space. After adding the > > > datanode to the cluster, I run again the same copy process > > > > > > hadoo dfs -copyFromLocal ... > > > > > > During the copy of these files to the DFS, I have run a java > > > application in order to check where the data is located (replication > > > level is set to 2) > > > > > > String [][] hostnames = dfs.getFileCacheHints(inFile, 0, 100L); > > > > > > The output I print is the following > > > > > > File name = GASS.0011.63800-0011.63900.zip > > > File cache hints = gaiawl07.net4.lan gaiawl02.net4.lan > > > ############################################ > > > File name = GASS.0011.53100-0011.53200.zip > > > File cache hints = gaiawl03.net4.lan gaiawl02.net4.lan > > > ############################################ > > > File name = GASS.0011.23800-0011.23900.zip > > > File cache hints = gaiawl08.net4.lan gaiawl02.net4.lan > > > ############################################ > > > File name = GASS.0011.18800-0011.18900.zip > > > File cache hints = gaiawl02.net4.lan gaiawl06.net4.lan > > > .... > > > > > > In these small sample gaiawl02.net4.lan appears for every file, and > > > this is currently happening for every copied file. I launch the > > > copy process from that machine which is also the one which has 80GB of > > > free space. I did this because of the problem I pointed previously of > > > filling up a node and stalling the copy operation. > > > > > > Shouldn't be the data dispersed in all the nodes, because if that data > > > node crashes, only 1 replica of the data is going to exist at the > > > cluster. > > > > > > During the "staging" phase I understand that that perticulary node > > > contains a local copy of the file being added to the HDFS. But once a > > > block is filled this doesn't mean that the block has to be also on > > > that node. Am I right? > > > > > > Is it possible to spread the data among all the data nodes to avoid > > > that a node keeps 1 replica of every copied file? > > > > > > thanks > > > > >