[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

Viraj Jasani (Jira) Sat, 08 May 2021 05:28:06 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339535#comment-17339535
 ]


Viraj Jasani edited comment on HDFS-15982 at 5/8/21, 12:27 PM:
---------------------------------------------------------------

{quote}Regarding the UI. If the trash interval isn't set and If I select 
NO(move to trash), It still deletes with success? Check if the behaviour is 
like that, The client may be in wrong impression that things moved to trash, 
but it actually didn't. We should have bugged him back, Trash isn't enabled.
{quote}
If trash interval isn't set and if we select NO, it does delete with success 
(as per logic 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L1560]:
 file moves to trash only if skiptrash is false and trashInterval > 0)
{code:java}
case DELETE: {
  Configuration conf =
      (Configuration) context.getAttribute(JspHelper.CURRENT_CONF);
  long trashInterval =
      conf.getLong(FS_TRASH_INTERVAL_KEY, FS_TRASH_INTERVAL_DEFAULT);
  if (trashInterval > 0 && !skipTrash.getValue()) {
    LOG.info("{} is {} , trying to archive {} instead of removing",
        FS_TRASH_INTERVAL_KEY, trashInterval, fullpath);
    org.apache.hadoop.fs.Path path =
        new org.apache.hadoop.fs.Path(fullpath);
    Configuration clonedConf = new Configuration(conf);
    // To avoid caching FS objects and prevent OOM issues
    clonedConf.set("fs.hdfs.impl.disable.cache", "true");
    FileSystem fs = FileSystem.get(clonedConf);
    boolean movedToTrash = Trash.moveToAppropriateTrash(fs, path,
        clonedConf);
    if (movedToTrash) {
      final String js = JsonUtil.toJsonString("boolean", true);
      return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
    }
    // Same is the behavior with Delete shell command.
    // If moveToAppropriateTrash() returns false, file deletion
    // is attempted rather than throwing Error.
    LOG.debug("Could not move {} to Trash, attempting removal", fullpath);
  }
  final boolean b = cp.delete(fullpath, recursive.getValue());
  final String js = JsonUtil.toJsonString("boolean", b);
  return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
}

{code}
I think in UI, we can provide additional info in same model: "These buttons are 
useful only if fs.trash.interval has been configured. Without setting interval, 
files will be hard deleted anyways."

I just saw HDFS-14117 on using trash on RBF: "delete the files or dirs of one 
subcluster in a cluster with multiple subclusters".

 
{quote}Router would also have issues, if the trash path resolves to different 
NS, or to some path which isn't in the Mount Table, when Default Namespace 
isn't configured.
{quote}
I tried to explore if we can replace everything that 
Trash.moveToAppropriateTrash() offers with what ClientProtocol provides (that 
way router and mount table resolution would be taken care of) but it seems 
almost impossible to replace all FileSystem utilities used by Trash.

 

Considering all the discussion happened so far (and with further feedbacks from 
Ayush and Wei-Chiu), my proposals for addendum PR#2976:
 * Keep default value of skiptrash true (thereby making default behaviour of 
HTTP API compatible with existing releases)
 * Update doc and tests accordingly
 * Let NamenodeWebHdfsMethods Delete endpoint perform "moving file to .Trash 
dir" operation with skiptrash false only if NameNode uses NameNodeRpcServer 
(i.e default HDFS FileSystem) because RouterRpcServer requires special 
treatment that Trash utility is not providing as of today. If we agree to this, 
we need to document this option with proper information. This is not yet 
present on addendum PR#2976, will try only once we have consensus.


was (Author: vjasani):
{quote}Regarding the UI. If the trash interval isn't set and If I select 
NO(move to trash), It still deletes with success? Check if the behaviour is 
like that, The client may be in wrong impression that things moved to trash, 
but it actually didn't. We should have bugged him back, Trash isn't enabled.
{quote}
If trash interval isn't set and if we select NO, it does delete with success 
(as per logic 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L1560]:
 file moves to trash only if skiptrash is false and trashInterval > 0)
{code:java}
case DELETE: {
  Configuration conf =
      (Configuration) context.getAttribute(JspHelper.CURRENT_CONF);
  long trashInterval =
      conf.getLong(FS_TRASH_INTERVAL_KEY, FS_TRASH_INTERVAL_DEFAULT);
  if (trashInterval > 0 && !skipTrash.getValue()) {
    LOG.info("{} is {} , trying to archive {} instead of removing",
        FS_TRASH_INTERVAL_KEY, trashInterval, fullpath);
    org.apache.hadoop.fs.Path path =
        new org.apache.hadoop.fs.Path(fullpath);
    Configuration clonedConf = new Configuration(conf);
    // To avoid caching FS objects and prevent OOM issues
    clonedConf.set("fs.hdfs.impl.disable.cache", "true");
    FileSystem fs = FileSystem.get(clonedConf);
    boolean movedToTrash = Trash.moveToAppropriateTrash(fs, path,
        clonedConf);
    if (movedToTrash) {
      final String js = JsonUtil.toJsonString("boolean", true);
      return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
    }
    // Same is the behavior with Delete shell command.
    // If moveToAppropriateTrash() returns false, file deletion
    // is attempted rather than throwing Error.
    LOG.debug("Could not move {} to Trash, attempting removal", fullpath);
  }
  final boolean b = cp.delete(fullpath, recursive.getValue());
  final String js = JsonUtil.toJsonString("boolean", b);
  return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
}

{code}
I think in UI, we can provide additional info in same model: "These buttons are 
useful only if fs.trash.interval has been configured. Without setting interval, 
files will be hard deleted anyways."

I just saw HDFS-14117 on using trash on RBF: "delete the files or dirs of one 
subcluster in a cluster with multiple subclusters".

 
{quote}Router would also have issues, if the trash path resolves to different 
NS, or to some path which isn't in the Mount Table, when Default Namespace 
isn't configured.
{quote}
I tried to explore if we can replace everything that 
Trash.moveToAppropriateTrash() offers with what ClientProtocol provides (that 
way router and mount table resolution would be taken care of) but it seems 
almost impossible to replace all FileSystem utilities used by Trash.

> Deleted data using HTTP API should be saved to the trash
> --------------------------------------------------------
>
>                 Key: HDFS-15982
>                 URL: https://issues.apache.org/jira/browse/HDFS-15982
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: hdfs, hdfs-client, httpfs, webhdfs
>            Reporter: Bhavik Patel
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot 
> 2021-04-23 at 4.36.57 PM.png
>
>          Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> If we delete the data from the Web UI then it should be first moved to 
> configured/default Trash directory and after the trash interval time, it 
> should be removed. currently, data directly removed from the system[This 
> behavior should be the same as CLI cmd]
> This can be helpful when the user accidentally deletes data from the Web UI.
> Similarly we should provide "Skip Trash" option in HTTP API as well which 
> should be accessible through Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

Reply via email to