[ https://issues.apache.org/jira/browse/HADOOP-19233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923212#comment-17923212 ]
ASF GitHub Bot commented on HADOOP-19233: ----------------------------------------- anmolanmol1234 commented on code in PR #7265: URL: https://github.com/apache/hadoop/pull/7265#discussion_r1938969052 ########## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ListBlobQueue.java: ########## @@ -170,7 +170,7 @@ private List<Path> dequeue() { return pathListForConsumption; } - private synchronized int size() { + synchronized int size() { Review Comment: need for this removal ? > ABFS: [FnsOverBlob] Implementing Rename and Delete APIs over Blob Endpoint > -------------------------------------------------------------------------- > > Key: HADOOP-19233 > URL: https://issues.apache.org/jira/browse/HADOOP-19233 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure > Affects Versions: 3.4.0 > Reporter: Anuj Modi > Assignee: Manish Bhatt > Priority: Major > Labels: pull-request-available > > Currently, we only support rename and delete operations on the DFS endpoint. > The reason for supporting rename and delete operations on the Blob endpoint > is that the Blob endpoint does not account for hierarchy. We need to ensure > that the HDFS contracts are maintained when performing rename and delete > operations. Renaming or deleting a directory over the Blob endpoint requires > the client to handle the orchestration and rename or delete all the blobs > within the specified directory. > > The task outlines the considerations for implementing rename and delete > operations for the FNS-blob endpoint to ensure compatibility with HDFS > contracts. > * {*}Blob Endpoint Usage{*}: The task addresses the need for abstraction in > the code to maintain HDFS contracts while performing rename and delete > operations on the blob endpoint, which does not support hierarchy. > * {*}Rename Operations{*}: The {{AzureBlobFileSystem#rename()}} method will > use a {{RenameHandler}} instance to handle rename operations, with separate > handlers for the DFS and blob endpoints. This method includes prechecks, > destination adjustments, and orchestration of directory renaming for blobs. > * {*}Atomic Rename{*}: Atomic renaming is essential for blob endpoints, as > it requires orchestration to copy or delete each blob within the directory. A > configuration will allow developers to specify directories for atomic > renaming, with a JSON file to track the status of renames. > * {*}Delete Operations{*}: Delete operations are simpler than renames, > requiring fewer HDFS contract checks. For blob endpoints, the client must > handle orchestration, including managing orphaned directories created by > Az-copy. > * {*}Orchestration for Rename/Delete{*}: Orchestration for rename and delete > operations over blob endpoints involves listing blobs and performing actions > on each blob. The process must be optimized to handle large numbers of blobs > efficiently. > * {*}Need for Optimization{*}: Optimization is crucial because the > {{ListBlob}} API can return a maximum of 5000 blobs at once, necessitating > multiple calls for large directories. The task proposes a producer-consumer > model to handle blobs in parallel, thereby reducing processing time and > memory usage. > * {*}Producer-Consumer Design{*}: The proposed design includes a producer to > list blobs, a queue to store the blobs, and a consumer to process them in > parallel. This approach aims to improve efficiency and mitigate memory issues. > More details will follow > Perquisites for this Patch: > 1. HADOOP-19187 ABFS: [FnsOverBlob]Making AbfsClient Abstract for supporting > both DFS and Blob Endpoint - ASF JIRA (apache.org) > 2. HADOOP-19226 ABFS: [FnsOverBlob]Implementing Azure Rest APIs on Blob > Endpoint for AbfsBlobClient - ASF JIRA (apache.org) > 3. HADOOP-19207 ABFS: [FnsOverBlob]Response Handling of Blob Endpoint APIs > and Metadata APIs - ASF JIRA (apache.org) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org