>Would be great if someone wrote some tools that, given a block ID, tracked
>the life of the file that contained it (including renames of containing
> dirs, etc). Shouldn't be too difficult.

There's a tool for this in MapRed's contrib section under block_forensics. It was released in 21, I believe. It hasn't been kept up to date though, so I'm not sure how functional it still is.
-Jakob

Todd Lipcon wrote:
What's the last audit log entry prior to 2010-11-10 21:42:33,389?

-Todd

On Thu, Nov 11, 2010 at 2:10 PM, David Rosenstrauch <dar...@darose.net <mailto:dar...@darose.net>> wrote:

    Saw a couple more references to that block before the "to delete
     blk" messages:

    2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* NameSystem.addToInvalidates: blk_-4237880568969698703 is
    added to invalidSet of <our ip prefix>.169:50010
    2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* NameSystem.addToInvalidates: blk_-4237880568969698703 is
    added to invalidSet of <our ip prefix>.173:50010
    2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* NameSystem.addToInvalidates: blk_-4237880568969698703 is
    added to invalidSet of <our ip prefix>.176:50010

    Again, I'm not sure why this is happening though.


    BTW, I appreciate your comments below (about it getting moved out of
    the temp directory and then getting removed in another pass).  But I
    grepped the logs as you suggested, and I still don't see how it got
    moved/deleted:

    [r...@hdmaster hadoop-0.20]# grep
    _attempt_201010221550_0418_r_000001_0
    hadoop-hadoop-namenode-hdmaster.log.2010-11-10
    2010-11-10 21:42:28,442 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
    ugi=root,root,bin,daemon,sys,adm,disk,wheel     ip=/<our ip
    root>.176     cmd=create      src=<our root
    
dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state
     dst=null        perm=root:supergroup:rw-r--r--

    2010-11-10 21:42:29,802 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK* NameSystem.allocateBlock: <our root
    
dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state.
    blk_-4237880568969698703_13404582
    2010-11-10 21:42:30,360 INFO org.apache.hadoop.hdfs.StateChange:
    Removing lease on  file <our root
    
dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state
    from client DFSClient_attempt_201010221550_0418_r_000001_0
    2010-11-10 21:42:30,360 INFO org.apache.hadoop.hdfs.StateChange:
    DIR* NameSystem.completeFile: file <our root
    
dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state
    is closed by DFSClient_attempt_201010221550_0418_r_000001_0


    I have to say, I'm really quite stumped.  Been looking at this all
    afternoon, and I still have no idea how/why that file got purged.  :-(

    Thanks,

    DR


    On 11/11/2010 02:13 PM, Todd Lipcon wrote:

        Given that it's an MR output, my guess is it got moved out of
        the temporary
        directory when the job "Committed" and then was removed as
        another pass. I'd
        grep for the containing directory name in the audit logs to see
        where it got
        moved to and how it was eventually deleted.

        Would be great if someone wrote some tools that, given a block
        ID, tracked
        the life of the file that contained it (including renames of
        containing
        dirs, etc). Shouldn't be too difficult.

        -Todd

        On Thu, Nov 11, 2010 at 9:38 AM, David
        Rosenstrauch<dar...@darose.net <mailto:dar...@darose.net>>wrote:

            Sorry, I stand corrected.  When I grep the name node logs
            for the block ID

            there are some additional lines:

            [r...@hdmaster hadoop-0.20]# grep 4237880568969698703_13404582
            hadoop-hadoop-namenode-hdmaster.log.2010-11-10

            2010-11-10 21:42:29,802 INFO
            org.apache.hadoop.hdfs.StateChange: BLOCK*
            NameSystem.allocateBlock:<our root
            
dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state.
            blk_-4237880568969698703_13404582
             2010-11-10 21:42:30,355 INFO
            org.apache.hadoop.hdfs.StateChange: BLOCK*
            NameSystem.addStoredBlock: blockMap updated:<our ip
            prefix>.169:50010 is
            added to blk_-4237880568969698703_13404582 size 326522
            2010-11-10 21:42:30,356 INFO
            org.apache.hadoop.hdfs.StateChange: BLOCK*
            NameSystem.addStoredBlock: blockMap updated:<our ip
            prefix>.173:50010 is
            added to blk_-4237880568969698703_13404582 size 326522
            2010-11-10 21:42:30,357 INFO
            org.apache.hadoop.hdfs.StateChange: BLOCK*
            NameSystem.addStoredBlock: blockMap updated:<our ip
            prefix>.176:50010 is
            added to blk_-4237880568969698703_13404582 size 326522
            2010-11-10 21:42:33,585 INFO
            org.apache.hadoop.hdfs.StateChange: BLOCK* ask
            <our ip prefix>.176:50010 to delete
             blk_-4237880568969698703_13404582
            2010-11-10 21:42:42,598 INFO
            org.apache.hadoop.hdfs.StateChange: BLOCK* ask
            <our ip prefix>.173:50010 to delete
             blk_719995703651674050_13393533
            blk_2844104197159113873_13393516
            blk_6711472063024831893_13393527
            blk_240592081553699046_13393054 blk_6933038421638165347_13393232
            blk_-1288897671244436593_13393351
            blk_1408176921409258101_13393152
            blk_-4237880568969698703_13404582
            blk_-4188858742780299895_13393410
            blk_-2670318277937301225_13393450
            blk_-1858397614006984730_13393480
            blk_6292330466651227068_13393505
            2010-11-10 21:42:42,598 INFO
            org.apache.hadoop.hdfs.StateChange: BLOCK* ask
            <our ip prefix>.169:50010 to delete
             blk_719995703651674050_13393533
            blk_6711472063024831893_13393527
            blk_-514463419649319075_13404582
            blk_-1980639884885154260_13404582
            blk_240592081553699046_13393054
            blk_8520588884031249451_13393428
            blk_-6620589068796293265_13393089
            blk_6933038421638165347_13393232
            blk_-1288897671244436593_13393351
            blk_-3581082286710707012_13393556
            blk_1408176921409258101_13393152
            blk_-1858397614006984730_13393480
            blk_6292330466651227068_13393505
            blk_2844104197159113873_13393516
            blk_-4237880568969698703_13404582
            blk_-4188858742780299895_13393410
            blk_-2670318277937301225_13393450


            I'm still not clear why those deletes are happening though.
             Will read
            through the logs again at those locations.  But if anyone's
            got any ideas,
            pls chime in.

            DR




--
Todd Lipcon
Software Engineer, Cloudera

Reply via email to