[jira] [Work logged] (HDFS-16322) The NameNode implementation of ClientProtocol.truncate(...) can cause data loss.

ASF GitHub Bot (Jira) Mon, 22 Nov 2021 09:29:06 -0800


     [ 
https://issues.apache.org/jira/browse/HDFS-16322?focusedWorklogId=684861&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684861
 ]


ASF GitHub Bot logged work on HDFS-16322:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Nov/21 17:28
            Start Date: 22/Nov/21 17:28
    Worklog Time Spent: 10m 
      Work Description: Nsupyq opened a new pull request #3705:
URL: https://github.com/apache/hadoop/pull/3705


   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   This PR fix [HDFS-16322](https://issues.apache.org/jira/browse/HDFS-16322).
   
   The NameNode implementation of ClientProtocol.truncate(...) can cause data 
loss. If dfsclient drops the first response of a truncate RPC call, the retry 
by retry cache will truncate the file again and cause data loss. Specifically, 
under concurrency, after the first execution of truncate(...), concurrent 
requests from other clients may append new data and change the file length. 
When truncate(...) is retried after that, it will truncate the file again, 
which causes data loss.
   
   This patch utilized retry cache to avoid such data loss. When the truncate 
operation is applied for the first time, the status of this operation and the 
return value from server is recorded in retry cache. If this truncate is 
retried, server will directly read from retry cache and perform no operation 
that may cause non-idempotence.
   
   See [HDFS-16322](https://issues.apache.org/jira/browse/HDFS-16322) 
description for more details.
   
   ### How was this patch tested?
   
   We added a new unit test for the idempotency of truncate operation under 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeRetryCache.java.
 This test will issue a truncate operation on an existing file, remove the 
whole file and then issue a retry of previous operation. If the truncate is 
idempotent, the retry should return successfully and does not throw an 
exception saying it truncates on a non-existing file.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 684861)
    Remaining Estimate: 0h
            Time Spent: 10m

> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss.
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-16322
>                 URL: https://issues.apache.org/jira/browse/HDFS-16322
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namanode
>         Environment: The runtime environment is Ubuntu 18.04, Java 1.8.0_222 
> and Apache Maven 3.6.0. 
> The bug can be reproduced by the the testMultipleTruncate() in the 
> attachment. First, replace the file TestFileTruncate.java under the directory 
> "hadoop-3.3.1-src/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/"
>  with the attachment. Then run "mvn test 
> -Dtest=org.apache.hadoop.hdfs.server.namenode.TestFileTruncate#testMultipleTruncate"
>  to run the testcase. Finally the "assertFileLength(p, n+newLength)" at 199 
> line of TestFileTruncate.java will abort. Because the retry of truncate() 
> changes the file size and cause data loss.
>            Reporter: nhaorand
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: TestFileTruncate.java, h16322_20211116.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss. If dfsclient drops the first response of a truncate RPC call, the retry 
> by retry cache will truncate the file again and cause data loss.
> HDFS-7926 avoids repeated execution of truncate(...) by checking if the file 
> is already being truncated with the same length. However, under concurrency, 
> after the first execution of truncate(...), concurrent requests from other 
> clients may append new data and change the file length. When truncate(...) is 
> retried after that, it will find the file has not been truncated with the 
> same length and truncate it again, which causes data loss.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HDFS-16322) The NameNode implementation of ClientProtocol.truncate(...) can cause data loss.

Reply via email to