[
https://issues.apache.org/jira/browse/HDFS-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053675#comment-17053675
]
Hadoop QA commented on HDFS-15209:
----------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color}
| {color:red} HDFS-15209 does not apply to trunk. Rebase required? Wrong
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15209 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12995818/fix_genstamp.patch |
| Console output |
https://builds.apache.org/job/PreCommit-HDFS-Build/28908/console |
| Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
This message was automatically generated.
> Lease recovery: namenode not able to commitBlockSynchronization if client
> comes back and closes the file beforehand
> -------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-15209
> URL: https://issues.apache.org/jira/browse/HDFS-15209
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Reporter: Ye Ni
> Priority: Major
> Attachments: fix_genstamp.patch
>
>
> We observed a case, client closes the file after soft lease recovery already
> started but before namenode commitBlockSynchronization.
> This leads to commitBlockSynchronization failure with error below, which
> requires either the file isn't closed or the last block isn't in complete
> state.
> As a result, we will have corrupted replicas by genstamp mismatch, since
> datanodes may have finished block recovery with a new genstamp.
> This could happen when client delays a lot on write and comes back when lease
> recovery already happens by write/append/truncate request from other client.
> Repro steps:
> # Client #1 finishes writing a file, but hasn't closed yet.
> # Client #1 doesn't renew lease for a soft lease period.
> # Another client #2 appends the same file.
> # Soft lease recovery begins.
> # Block recovery in datanodes finishes.
> # Client #1 comes back to close the file.
> # Close succeeds since Client #1 still hold the lease (lease is revoked as
> the last step in soft recovery).
> # Namenode tries to commitBlockSynchronization with error log below.
> # Namenode and datanodes have different genstamp for this file, resulting in
> corrupted block.
> Fix:
> Check the state of the last block when completing the file. If it's under
> recovery, it means lease recovery started, but namenode hasn't
> commitBlockSynchronization yet.
> In this case, don't complete file.
>
> {code:java}
> 2020-02-22 22:47:04,698 INFO [IPC Server handler 32 on 8020]
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(oldBlock=BP-269461681-10.65.230.22-1554624547020:blk_2642650669_3063725879,
> newgenerationstamp=3063765480, newlength=262144000,
> newtargets=[25.65.180.47:10010, 25.65.161.162:10010, 100.101.88.162:10010],
> closeFile=true, deleteBlock=false)
> 2020-02-22 22:47:04,698 DEBUG [IPC Server handler 32 on 8020]
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected block
> (=BP-269461681-10.65.230.22-1554624547020:blk_2642650669_3063725879) since
> the file
> (=132269111992796228.data.637180347427616457.tmp.132269136349107823.copying)
> is not under construction
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]