[jira] [Commented] (HDFS-16322) The NameNode implementation of ClientProtocol.truncate(...) can cause data loss.

Xiaoqiao He (Jira) Sun, 14 Nov 2021 00:53:09 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-16322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443263#comment-17443263
 ]


Xiaoqiao He commented on HDFS-16322:
------------------------------------

Thanks [~Nsupyq] for your report. It is interesting case. I am not sure if it 
is reasonable to disable `idempotent` at HDFS-7926, but it could cause data 
loss when client retry request for some network or other issues. IMO, it could 
be fixed when enable `idempotent` for truncate.

{quote}"idempotent" means applying the same operations multiple times will get 
the same result. If there is an append in the middle, the retry could get 
different results.

E.g. getPermission is idempotent. However, if there is a setPermission (or 
delete, rename, etc.) in the middle, the retry of getPermission could get a 
different result.{quote}
Just notice that [~szetszwo] leave this comment at HDFS-7926, Not sure if this 
explain is proper now. Such as Client A request `create` with overwrite 
operation and execute successful at NameNode side but not response to Client A, 
then it will retry. Before the retry request to NameNode, another Client B 
delete this file. Then retry request has invoked and return the last result 
because retry cache. It is the same case as `truncate`. 
cc [~shv], [~szetszwo] would you mind to give some suggestions? Thanks.

> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss.
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-16322
>                 URL: https://issues.apache.org/jira/browse/HDFS-16322
>             Project: Hadoop HDFS
>          Issue Type: Bug
>         Environment: The runtime environment is Ubuntu 18.04, Java 1.8.0_222 
> and Apache Maven 3.6.0. 
> The bug can be reproduced by the the testMultipleTruncate() in the 
> attachment. First, replace the file TestFileTruncate.java under the directory 
> "hadoop-3.3.1-src/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/"
>  with the attachment. Then run "mvn test 
> -Dtest=org.apache.hadoop.hdfs.server.namenode.TestFileTruncate#testMultipleTruncate"
>  to run the testcase. Finally the "assertFileLength(p, n+newLength)" at 199 
> line of TestFileTruncate.java will abort. Because the retry of truncate() 
> changes the file size and cause data loss.
>            Reporter: nhaorand
>            Priority: Major
>         Attachments: TestFileTruncate.java
>
>
> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss. If dfsclient drops the first response of a truncate RPC call, the retry 
> by retry cache will truncate the file again and cause data loss.
> HDFS-7926 avoids repeated execution of truncate(...) by checking if the file 
> is already being truncated with the same length. However, under concurrency, 
> after the first execution of truncate(...), concurrent requests from other 
> clients may append new data and change the file length. When truncate(...) is 
> retried after that, it will find the file has not been truncated with the 
> same length and truncate it again, which causes data loss.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16322) The NameNode implementation of ClientProtocol.truncate(...) can cause data loss.

Reply via email to