[
https://issues.apache.org/jira/browse/HDFS-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tsz Wo Nicholas Sze resolved HDFS-1336.
---------------------------------------
Resolution: Not a Problem
I guess that this is not a problem anymore. Please feel free to reopen this if
I am wrong. Resolving ...
> TruncateBlock does not update in-memory information correctly
> -------------------------------------------------------------
>
> Key: HDFS-1336
> URL: https://issues.apache.org/jira/browse/HDFS-1336
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 0.20-append
> Reporter: Thanh Do
>
> - Component: data node
>
> - Version: 0.20-append
>
> - Summary: we found a case that when a block is truncated during updateBlock,
> the length on the ongoingCreates is not updated, hence leading to failed
> append.
>
> - Setup:
> # disks / datanode = 3
> # failures = 2
> failure type = crash
> When/where failure happens = (see below)
>
> - Details:
> 1) Client writes to dn1-dn2-dn3. Write successes.
> 2) Now client tried to append. It first call dn1.recoverBlock().This
> recoverBlock succeeds.
> 3) Suppose the pipeline is dn3-dn2-dn1. Client sends packet to dn3.
> dn3 forwards the packet to dn2 and writes to its disk (i.e dn3's disk).
> Now, *dn2 crashes*, so that dn1 has not received this packet yet.
> 4) Client calls dn1.recoverBlock() again, this time with dn3-dn1 in the
> pipeline.
> dn1 then calls dn3.startBlockRecovery() to terminate the writer thread in dn3.
> get the *in memory* metadata info of the block, and verify that info with
> the real file on disk.
> dn3 maintains an in-memory data structure call *ongoingCreates* to record
> information about currently-being-created block. If a block is finalized, then
> its info is removed from *ongoingCreates*.
>
> Now suppose that at the time dn3 receives startBlockRecovery() request from
> dn1,
> it has:
> + finished writing data to disk (hence, the block length on disk is 1024)
> + set visible in memory length (hence, in memory length is also 1024)
> but it *has not* finalized the block, hence the block info is still in the
> *ongoingCreates*.
> (Note: the interruption of writer thread makes the finalization never happens)
>
> Because of all above stuff, dn3 gives dn1 info about the block with length
> 1024.
>
> 5. Now dn1 calls its own startBlockRecovery() successfully (because the
> on-disk
> file length and memory file length match, both are 512 byte).
>
> 6. Now, dn1 has a sync list (block_X_GS1 at dn1 with length 512, block_X_GS1
> at dn3 with length 1024).
> it needs to make sure all dn in the pipeline agree on new GS and length.
> dn1 calls NN.nextGS() to get new GS2. It form new block_X_GS2 with length
> 512, and
> call updateBlock on dn3 and itself.
>
> 7. dn3, receiving updateBlock request from dn1, does:
> + rename the block from block_X_GS1 ==> block_X_GS2
> + truncate the block file length from 1024 to 512
> But, here is the key, it *does not update the length of the block kept
> in ongoingCreates*
> + return to dn1 successfully
>
> 8. Now, dn1 call its own updateBlock and *crashes*.
>
> 9. From client point of view, dn1.recoverBlock fails.
> It retries call dn1.recoverBlock six times, and declare dn1 as bad.
>
> 10. Client now calls dn3.recoverBlock()
>
> 11. Dn3 in turns calls its startBlockRecovery() to
> + interrupt block writer threads if any
> + getBlockMetadataInfo (as part of forming the syncList, and updateBlock
> later)
> > it first look into ongoingCreates to see the block info is there,
> and found it (because the block is not finalized).
> Hence, in-memory length is 1024 (even though truncateBlock is
> called before)
> > verify if the in-memory length (1024) with on-disk length (512)
> Hence, the *un-matched file length exception*
>
> 12. From client point of view, recoverBlock fails, because *All data nodes
> are bad*
> Client retries calling dn3.recoverBlock five more times and gets the same
> exception,
> Hence, append fails.
>
> Note:
> - to fix it, i think when truncating the file, we need to update the
> ongoingCreates too
> (but i am not sure, if we fix thing like this, is there any other workload
> may affect)
> - interestingly, NN.leaseRecovery fails because of the exact exception at dn3.
> - until dead node restarts and NN.leaseRecovery is triggered again, NN is
> still the lease holder of the file
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do ([email protected]) and
> Haryadi Gunawi ([email protected]
--
This message was sent by Atlassian JIRA
(v6.2#6252)