[jira] [Commented] (HDFS-6618) Edit log corruption may still happen even after HDFS-6527

Kihwal Lee (JIRA) Tue, 01 Jul 2014 13:47:20 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049309#comment-14049309
 ]


Kihwal Lee commented on HDFS-6618:
----------------------------------

I guess we can move it inside the first lock, since it is already holding the 
directory write lock. Not many types of ops will go through anyway.  But if we 
remove them as we unlink inodes, instead of building up potentially huge data 
structure and do it at once, it may be faster & cheaper.

Is there a clean way to remove each inode from the inode map from 
{{destroyAndCollectBlocks()}} of {{INodeFile}} and {{INodeDirectory}}?


> Edit log corruption may still happen even after HDFS-6527
> ---------------------------------------------------------
>
>                 Key: HDFS-6618
>                 URL: https://issues.apache.org/jira/browse/HDFS-6618
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.5.0
>            Reporter: Kihwal Lee
>            Priority: Blocker
>         Attachments: HDFS-6618.patch
>
>
> After HDFS-6527, we have not seen the edit log corruption for weeks on 
> multiple clusters until yesterday. Previously, we would see it within 30 
> minutes on a cluster.
> But the same condition was reproduced even with HDFS-6527.  The only 
> explanation is that the RPC handler thread serving {{addBlock()}} was 
> accessing stale parent value.  Although nulling out parent is done inside the 
> {{FSNamesystem}} and {{FSDirectory}} write lock, there is no memory barrier 
> because there is no "synchronized" block involved in the process.
> I suggest making parent volatile.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6618) Edit log corruption may still happen even after HDFS-6527

Reply via email to