[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815641#comment-13815641
 ] 

sathish commented on HDFS-5428:
-------------------------------

vinay as i observed when debugging the scenario along with your patch,
There is some path mismatch,when counting the blocks of snapshotfile under 
construction,due to this it is not removing that blocks from block threshold.
{code}
String fileSnapshotPath = StringUtils.replaceOnce(
          file,
          snapshottableDir,
          Snapshot.getSnapshotPath(snapshottableDir,
              Snapshot.getSnapshotName(snapshot)));
{code}
String util is not replacing the correct path.
logs for this 
2013-11-07 01:05:15,103 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: 
Exception in namenode join
java.lang.RuntimeException: java.io.FileNotFoundException: File does not exist: 
/.snapshot/snap_6ran/_temporary/0/_temporary/attempt_local1866843415_0001_m_000000_0/part-m-00000
       at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:5068)
(FSNamesystem.java:853)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:540)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:482)


> under construction files deletion after snapshot+checkpoint+nn restart leads 
> nn safemode
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-5428
>                 URL: https://issues.apache.org/jira/browse/HDFS-5428
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Vinay
>            Assignee: Vinay
>         Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch
>
>
> 1. allow snapshots under dir /foo
> 2. create a file /foo/test/bar and start writing to it
> 3. create a snapshot s1 under /foo after block is allocated and some data has 
> been written to it
> 4. Delete the directory /foo/test
> 5. wait till checkpoint or do saveNameSpace
> 6. restart NN.
> NN enters to safemode.
> Analysis:
> Snapshot nodes loaded from fsimage are always complete and all blocks will be 
> in COMPLETE state. 
> So when the Datanode reports RBW blocks those will not be updated in 
> blocksmap.
> Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to