[ https://issues.apache.org/jira/browse/HADOOP-19381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Manish Bhatt resolved HADOOP-19381. ----------------------------------- Resolution: Duplicate > [ABFS] Support Rename and Delete operation over FNS-Blob endpoint > ----------------------------------------------------------------- > > Key: HADOOP-19381 > URL: https://issues.apache.org/jira/browse/HADOOP-19381 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure > Affects Versions: 3.5.0 > Reporter: Manish Bhatt > Assignee: Manish Bhatt > Priority: Major > Labels: pull-request-available > > Currently, we only support rename and delete operations on the DFS endpoint. > The reason for supporting rename and delete operations on the Blob endpoint > is that the Blob endpoint does not account for hierarchy. We need to ensure > that the HDFS contracts are maintained when performing rename and delete > operations. Renaming or deleting a directory over the Blob endpoint requires > the client to handle the orchestration and rename or delete all the blobs > within the specified directory. > > The task outlines the considerations for implementing rename and delete > operations for the FNS-blob endpoint to ensure compatibility with HDFS > contracts. > * {*}Blob Endpoint Usage{*}: The task addresses the need for abstraction in > the code to maintain HDFS contracts while performing rename and delete > operations on the blob endpoint, which does not support hierarchy. > * {*}Rename Operations{*}: The {{AzureBlobFileSystem#rename()}} method will > use a {{RenameHandler}} instance to handle rename operations, with separate > handlers for the DFS and blob endpoints. This method includes prechecks, > destination adjustments, and orchestration of directory renaming for blobs. > * {*}Atomic Rename{*}: Atomic renaming is essential for blob endpoints, as > it requires orchestration to copy or delete each blob within the directory. A > configuration will allow developers to specify directories for atomic > renaming, with a JSON file to track the status of renames. > * {*}Delete Operations{*}: Delete operations are simpler than renames, > requiring fewer HDFS contract checks. For blob endpoints, the client must > handle orchestration, including managing orphaned directories created by > Az-copy. > * {*}Orchestration for Rename/Delete{*}: Orchestration for rename and delete > operations over blob endpoints involves listing blobs and performing actions > on each blob. The process must be optimized to handle large numbers of blobs > efficiently. > * {*}Need for Optimization{*}: Optimization is crucial because the > {{ListBlob}} API can return a maximum of 5000 blobs at once, necessitating > multiple calls for large directories. The task proposes a producer-consumer > model to handle blobs in parallel, thereby reducing processing time and > memory usage. > * {*}Producer-Consumer Design{*}: The proposed design includes a producer to > list blobs, a queue to store the blobs, and a consumer to process them in > parallel. This approach aims to improve efficiency and mitigate memory issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org