Ok it seems I have the file system corrupted. How can I recover from this
 bin/hadoop fsck /
....
/tmp/hadoop-aolias/mapred/system/job_200803241610_0001/job.jar:  Under
replicated blk_4445907956276011533. Target Replicas is 10 but found 7
replica(s).
...........
/user/aolias/IDT/tm/GASS.0011.98100-0011.98200.zip: MISSING 1 blocks
of total size 14684276 B.
Status: CORRUPT
 Total size:    16621314 B
 Total dirs:    13
 Total files:   15
 Total blocks:  5 (avg. block size 3324262 B)
  ********************************
  CORRUPT FILES:        1
  MISSING BLOCKS:       1
  MISSING SIZE:         14684276 B
  ********************************
 Minimally replicated blocks:   4 (80.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       1 (20.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    2
 Average block replication:     2.6
 Missing replicas:              3 (23.076923 %)
 Number of data-nodes:          13
 Number of racks:               1


The filesystem under path '/' is CORRUPT


On 24/03/2008, Alfonso Olias Sanz <[EMAIL PROTECTED]> wrote:
> Hi Ted
>  Thanks for the info. But running the distfs I got this exception
>
>  bin/hadoop distcp -update
>  "file:///home2/mtlinden/simdata/GASS-RDS-3-G/tm" "/user/aolias/IDT"
>
>  With failures, global counters are inaccurate; consider running with -i
>  Copy failed: org.apache.hadoop.ipc.RemoteException:
>  org.apache.hadoop.dfs.SafeModeException: Cannot create
>  file/tmp/hadoop-aolias/mapred/system/distcp_idcrwx/_distcp_src_files.
>  Name node is in safe mode.
>  Safe mode will be turned off automatically.
>         at 
> org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:945)
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:929)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:280)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:910)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:512)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
>         at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>         at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
>         at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:1928)
>         at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:382)
>         at 
> org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:123)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:436)
>         at 
> org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:827)
>         at 
> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:379)
>         at 
> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:270)
>         at org.apache.hadoop.util.CopyFiles.setup(CopyFiles.java:686)
>         at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:475)
>         at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:550)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:563)
>
>
>
>  On 24/03/2008, Ted Dunning <[EMAIL PROTECTED]> wrote:
>  >
>  >
>  >  Copy from a machine that is *not* running as a data node in order to get
>  >  better balancing.  Using distcp may also help because the nodes actually
>  >  doing the copying will be spread across the cluster.
>  >
>  >  You should probably be running a rebalancing script as well if your nodes
>  >  have differing sizes.
>  >
>  >
>  >  On 3/24/08 7:35 AM, "Alfonso Olias Sanz" <[EMAIL PROTECTED]>
>  >  wrote:
>  >
>  >
>  >  > Hi
>  >  >
>  >  > I want to copy 1000 files (37GB) of data to the dfs.  I have a set up
>  >  > of 9-10 nodes, each one has between 5 to 15GB of free space.
>  >  >
>  >  > While coping the files from the local file system on nodeA, the node
>  >  > gets full of data and the the process gets stalled.
>  >  >
>  >  > I have another free node with 80GB of free space. After adding the
>  >  > datanode to the cluster, I run again the same copy process
>  >  >
>  >  > hadoo dfs  -copyFromLocal ...
>  >  >
>  >  > During the copy of these files to the DFS, I have run a java
>  >  > application in order to check where the data is located (replication
>  >  > level is set to 2)
>  >  >
>  >  > String [][] hostnames = dfs.getFileCacheHints(inFile, 0, 100L);
>  >  >
>  >  > The output I print is the following
>  >  >
>  >  > File name = GASS.0011.63800-0011.63900.zip
>  >  > File cache hints =   gaiawl07.net4.lan gaiawl02.net4.lan
>  >  > ############################################
>  >  > File name = GASS.0011.53100-0011.53200.zip
>  >  > File cache hints =   gaiawl03.net4.lan gaiawl02.net4.lan
>  >  > ############################################
>  >  > File name = GASS.0011.23800-0011.23900.zip
>  >  > File cache hints =   gaiawl08.net4.lan gaiawl02.net4.lan
>  >  > ############################################
>  >  > File name = GASS.0011.18800-0011.18900.zip
>  >  > File cache hints =   gaiawl02.net4.lan gaiawl06.net4.lan
>  >  > ....
>  >  >
>  >  > In these small sample  gaiawl02.net4.lan appears for every file, and
>  >  > this is currently happening for every copied file.    I launch the
>  >  > copy process from that machine which is also the one which has 80GB of
>  >  > free space.  I did this because of the problem I pointed previously of
>  >  > filling up a node and stalling the copy operation.
>  >  >
>  >  > Shouldn't be the data dispersed in all the nodes, because if that data
>  >  > node crashes, only 1 replica of the data is going to exist at the
>  >  > cluster.
>  >  >
>  >  > During the "staging" phase I understand that that perticulary node
>  >  > contains a local copy of the file being added to the HDFS. But once a
>  >  > block is filled this doesn't mean that the block has to be also on
>  >  > that node. Am I right?
>  >  >
>  >  > Is it possible to spread the data among all the data nodes to avoid
>  >  > that a node keeps 1 replica of every copied file?
>  >  >
>  >  > thanks
>  >
>  >
>

Reply via email to