>Would be great if someone wrote some tools that, given a block ID, tracked
>the life of the file that contained it (including renames of containing
> dirs, etc). Shouldn't be too difficult.
There's a tool for this in MapRed's contrib section under
block_forensics. It was released in 21, I believe. It hasn't been kept
up to date though, so I'm not sure how functional it still is.
-Jakob
Todd Lipcon wrote:
What's the last audit log entry prior to 2010-11-10 21:42:33,389?
-Todd
On Thu, Nov 11, 2010 at 2:10 PM, David Rosenstrauch <dar...@darose.net
<mailto:dar...@darose.net>> wrote:
Saw a couple more references to that block before the "to delete
blk" messages:
2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addToInvalidates: blk_-4237880568969698703 is
added to invalidSet of <our ip prefix>.169:50010
2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addToInvalidates: blk_-4237880568969698703 is
added to invalidSet of <our ip prefix>.173:50010
2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addToInvalidates: blk_-4237880568969698703 is
added to invalidSet of <our ip prefix>.176:50010
Again, I'm not sure why this is happening though.
BTW, I appreciate your comments below (about it getting moved out of
the temp directory and then getting removed in another pass). But I
grepped the logs as you suggested, and I still don't see how it got
moved/deleted:
[r...@hdmaster hadoop-0.20]# grep
_attempt_201010221550_0418_r_000001_0
hadoop-hadoop-namenode-hdmaster.log.2010-11-10
2010-11-10 21:42:28,442 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root,bin,daemon,sys,adm,disk,wheel ip=/<our ip
root>.176 cmd=create src=<our root
dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state
dst=null perm=root:supergroup:rw-r--r--
2010-11-10 21:42:29,802 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.allocateBlock: <our root
dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state.
blk_-4237880568969698703_13404582
2010-11-10 21:42:30,360 INFO org.apache.hadoop.hdfs.StateChange:
Removing lease on file <our root
dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state
from client DFSClient_attempt_201010221550_0418_r_000001_0
2010-11-10 21:42:30,360 INFO org.apache.hadoop.hdfs.StateChange:
DIR* NameSystem.completeFile: file <our root
dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state
is closed by DFSClient_attempt_201010221550_0418_r_000001_0
I have to say, I'm really quite stumped. Been looking at this all
afternoon, and I still have no idea how/why that file got purged. :-(
Thanks,
DR
On 11/11/2010 02:13 PM, Todd Lipcon wrote:
Given that it's an MR output, my guess is it got moved out of
the temporary
directory when the job "Committed" and then was removed as
another pass. I'd
grep for the containing directory name in the audit logs to see
where it got
moved to and how it was eventually deleted.
Would be great if someone wrote some tools that, given a block
ID, tracked
the life of the file that contained it (including renames of
containing
dirs, etc). Shouldn't be too difficult.
-Todd
On Thu, Nov 11, 2010 at 9:38 AM, David
Rosenstrauch<dar...@darose.net <mailto:dar...@darose.net>>wrote:
Sorry, I stand corrected. When I grep the name node logs
for the block ID
there are some additional lines:
[r...@hdmaster hadoop-0.20]# grep 4237880568969698703_13404582
hadoop-hadoop-namenode-hdmaster.log.2010-11-10
2010-11-10 21:42:29,802 INFO
org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.allocateBlock:<our root
dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state.
blk_-4237880568969698703_13404582
2010-11-10 21:42:30,355 INFO
org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated:<our ip
prefix>.169:50010 is
added to blk_-4237880568969698703_13404582 size 326522
2010-11-10 21:42:30,356 INFO
org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated:<our ip
prefix>.173:50010 is
added to blk_-4237880568969698703_13404582 size 326522
2010-11-10 21:42:30,357 INFO
org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated:<our ip
prefix>.176:50010 is
added to blk_-4237880568969698703_13404582 size 326522
2010-11-10 21:42:33,585 INFO
org.apache.hadoop.hdfs.StateChange: BLOCK* ask
<our ip prefix>.176:50010 to delete
blk_-4237880568969698703_13404582
2010-11-10 21:42:42,598 INFO
org.apache.hadoop.hdfs.StateChange: BLOCK* ask
<our ip prefix>.173:50010 to delete
blk_719995703651674050_13393533
blk_2844104197159113873_13393516
blk_6711472063024831893_13393527
blk_240592081553699046_13393054 blk_6933038421638165347_13393232
blk_-1288897671244436593_13393351
blk_1408176921409258101_13393152
blk_-4237880568969698703_13404582
blk_-4188858742780299895_13393410
blk_-2670318277937301225_13393450
blk_-1858397614006984730_13393480
blk_6292330466651227068_13393505
2010-11-10 21:42:42,598 INFO
org.apache.hadoop.hdfs.StateChange: BLOCK* ask
<our ip prefix>.169:50010 to delete
blk_719995703651674050_13393533
blk_6711472063024831893_13393527
blk_-514463419649319075_13404582
blk_-1980639884885154260_13404582
blk_240592081553699046_13393054
blk_8520588884031249451_13393428
blk_-6620589068796293265_13393089
blk_6933038421638165347_13393232
blk_-1288897671244436593_13393351
blk_-3581082286710707012_13393556
blk_1408176921409258101_13393152
blk_-1858397614006984730_13393480
blk_6292330466651227068_13393505
blk_2844104197159113873_13393516
blk_-4237880568969698703_13404582
blk_-4188858742780299895_13393410
blk_-2670318277937301225_13393450
I'm still not clear why those deletes are happening though.
Will read
through the logs again at those locations. But if anyone's
got any ideas,
pls chime in.
DR
--
Todd Lipcon
Software Engineer, Cloudera