Ryan Blough created HADOOP-18798:
------------------------------------
Summary: hadoop distcp -delete sends deleted data to null instead
of trash
Key: HADOOP-18798
URL: https://issues.apache.org/jira/browse/HADOOP-18798
Project: Hadoop Common
Issue Type: Bug
Components: documentation
Reporter: Ryan Blough
In the docs the -delete option is specified as moving data to Trash when it is
enabled:
[https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html#:~:text=%2Ddelete,or%20overwrite%20options].
However, it does not go to trash, it goes to null. I know of two instances
where this misunderstanding has caused data loss.
The statement that the data goes to Trash should be removed, and it should be
specified that the data is deleted.
An earlier reproduction:
hdfs dfs -mkdir -p /tmp/test1/test2
hdfs dfs -put /tmp/test.img /tmp/
hdfs dfs -put /tmp/test.img /tmp/test2/file1
drwxr-xr-x - root supergroup 0 2023-04-17 19:07 /tmp/test1
drwxr-xr-x - hdfs supergroup 0 2023-04-17 19:06 /tmp/test1/test2
{-}rw-r{-}{-}r{-}- 3 hdfs supergroup 1073741824 2023-04-17 19:06
/tmp/test1/test2/file1
distcp -update -delete /tmp/test.img /tmp/test1
{-}rw-r{-}{-}r{-}- 3 root supergroup 1073741824 2023-04-17 18:52
/tmp/test.img
drwxr-xr-x - root supergroup 0 2023-04-17 19:03 /tmp/test1
{-}rw-r{-}{-}r{-}- 3 hdfs supergroup 1073741824 2023-04-17 19:03
/tmp/test1/test.img
2023-04-17 19:08:44,252 INFO FSNamesystem.audit: allowed=true ugi=hdfs
(auth:SIMPLE) ip=/172.25.41.195 cmd=delete src=/tmp/test1/test2
dst=null perm=null proto
[hdfs@c4401-node2 root]$ date
Mon Apr 17 19:11:22 UTC 2023
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]