[jira] [Commented] (HDFS-15209) Lease recovery: namenode not able to commitBlockSynchronization if client comes back and closes the file beforehand

Hadoop QA (Jira) Fri, 06 Mar 2020 10:21:12 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053675#comment-17053675
 ]


Hadoop QA commented on HDFS-15209:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} HDFS-15209 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15209 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12995818/fix_genstamp.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28908/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Lease recovery: namenode not able to commitBlockSynchronization if client 
> comes back and closes the file beforehand
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15209
>                 URL: https://issues.apache.org/jira/browse/HDFS-15209
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Ye Ni
>            Priority: Major
>         Attachments: fix_genstamp.patch
>
>
> We observed a case, client closes the file after soft lease recovery already 
> started but before namenode commitBlockSynchronization.
> This leads to commitBlockSynchronization failure with error below, which 
> requires either the file isn't closed or the last block isn't in complete 
> state.
> As a result, we will have corrupted replicas by genstamp mismatch, since 
> datanodes may have finished block recovery with a new genstamp.
> This could happen when client delays a lot on write and comes back when lease 
> recovery already happens by write/append/truncate request from other client.
> Repro steps:
>  # Client #1 finishes writing a file, but hasn't closed yet.
>  # Client #1 doesn't renew lease for a soft lease period.
>  # Another client #2 appends the same file.
>  # Soft lease recovery begins.
>  # Block recovery in datanodes finishes.
>  # Client #1 comes back to close the file.
>  # Close succeeds since Client #1 still hold the lease (lease is revoked as 
> the last step in soft recovery).
>  # Namenode tries to commitBlockSynchronization with error log below.
>  # Namenode and datanodes have different genstamp for this file, resulting in 
> corrupted block.
> Fix:
> Check the state of the last block when completing the file. If it's under 
> recovery, it means lease recovery started, but namenode hasn't 
> commitBlockSynchronization yet.
> In this case, don't complete file.
>  
> {code:java}
> 2020-02-22 22:47:04,698 INFO [IPC Server handler 32 on 8020] 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
> commitBlockSynchronization(oldBlock=BP-269461681-10.65.230.22-1554624547020:blk_2642650669_3063725879,
>  newgenerationstamp=3063765480, newlength=262144000, 
> newtargets=[25.65.180.47:10010, 25.65.161.162:10010, 100.101.88.162:10010], 
> closeFile=true, deleteBlock=false)
> 2020-02-22 22:47:04,698 DEBUG [IPC Server handler 32 on 8020] 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected block 
> (=BP-269461681-10.65.230.22-1554624547020:blk_2642650669_3063725879) since 
> the file 
> (=132269111992796228.data.637180347427616457.tmp.132269136349107823.copying) 
> is not under construction
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-15209) Lease recovery: namenode not able to commitBlockSynchronization if client comes back and closes the file beforehand

Reply via email to