[
https://issues.apache.org/jira/browse/HADOOP-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924100#comment-17924100
]
ASF GitHub Bot commented on HADOOP-19448:
-----------------------------------------
anujmodi2021 commented on code in PR #7353:
URL: https://github.com/apache/hadoop/pull/7353#discussion_r1943022172
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/ConcurrentWriteOperationDetectedException.java:
##########
@@ -26,7 +26,10 @@
public class ConcurrentWriteOperationDetectedException
extends AzureBlobFileSystemException {
- public ConcurrentWriteOperationDetectedException(String message) {
- super(message);
+ private static final String ERROR_MESSAGE = "Parallel access to the create
path detected. Failing request "
+ + "to honor single writer semantics";
+
+ public ConcurrentWriteOperationDetectedException() {
Review Comment:
Let's add javadoc for the usage of this exception
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java:
##########
@@ -555,58 +545,254 @@ public AbfsRestOperation createPath(final String path,
final AbfsRestOperation op = getAbfsRestOperation(
AbfsRestOperationType.PutBlob,
HTTP_METHOD_PUT, url, requestHeaders);
+ op.execute(tracingContext);
+ return op;
+ }
+
+ /**
+ * Checks if the specified path is a directory by listing its contents.
+ *
+ * @param path the path to check.
+ * @param tracingContext the tracing context for the service call.
+ * @return true if the path is a directory and contains entries, false
otherwise.
+ * @throws AzureBlobFileSystemException if the rest operation fails.
+ */
+ private boolean checkDirectoryByList(String path,
+ TracingContext tracingContext)
+ throws AzureBlobFileSystemException {
+ AbfsRestOperation listPathOp = listPath(path, false, 1, null,
+ tracingContext, false);
+ AbfsHttpOperation listPathResult = listPathOp.getResult();
+ if (listPathResult != null) {
+ // Determine if the path is a directory by checking if the list result
schema has any paths
+ return !listPathResult.getListResultSchema().paths().isEmpty();
+ }
+ return false;
+ }
+
+ /**
+ * Checks if the specified path exists as a directory.
+ *
+ * @param path the path of the directory to check.
+ * @param tracingContext the tracing context for the service call.
+ * @return true if the directory exists, false otherwise.
+ * @throws AzureBlobFileSystemException if the rest operation fails.
+ */
+ private boolean checkForDirectoryExistence(String path,
Review Comment:
Nit: I would also suggest to move all the private utility methods towards
end of file after alll public overrides
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java:
##########
@@ -486,49 +506,19 @@ public AbfsRestOperation createPath(final String path,
* @param isAppendBlob whether the path is an append blob.
* @param eTag the eTag of the path.
* @param contextEncryptionAdapter the context encryption adapter.
- * @param tracingContext the tracing context.
- * @param isCreateCalledFromMarkers whether the create is called from
markers.
+ * @param tracingContext the tracing context for the service call.
* @return the executed rest operation containing the response from the
server.
* @throws AzureBlobFileSystemException if the rest operation fails.
*/
- public AbfsRestOperation createPath(final String path,
+ public AbfsRestOperation createPathRestOp(final String path,
final boolean isFile,
final boolean overwrite,
final AzureBlobFileSystemStore.Permissions permissions,
Review Comment:
permissions is not used. We can remove as its not a overriden method now.
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java:
##########
@@ -555,58 +545,254 @@ public AbfsRestOperation createPath(final String path,
final AbfsRestOperation op = getAbfsRestOperation(
AbfsRestOperationType.PutBlob,
HTTP_METHOD_PUT, url, requestHeaders);
+ op.execute(tracingContext);
+ return op;
+ }
+
+ /**
+ * Checks if the specified path is a directory by listing its contents.
+ *
+ * @param path the path to check.
+ * @param tracingContext the tracing context for the service call.
+ * @return true if the path is a directory and contains entries, false
otherwise.
+ * @throws AzureBlobFileSystemException if the rest operation fails.
+ */
+ private boolean checkDirectoryByList(String path,
+ TracingContext tracingContext)
+ throws AzureBlobFileSystemException {
+ AbfsRestOperation listPathOp = listPath(path, false, 1, null,
+ tracingContext, false);
+ AbfsHttpOperation listPathResult = listPathOp.getResult();
+ if (listPathResult != null) {
+ // Determine if the path is a directory by checking if the list result
schema has any paths
+ return !listPathResult.getListResultSchema().paths().isEmpty();
Review Comment:
Use `isNonEmptyListing()` here. Sometimes the list calls return no paths
with a Continuation Token. Those cases will be misssed.
It will alos make code cleaner as it will call list with result parsing
internally
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java:
##########
@@ -555,58 +545,254 @@ public AbfsRestOperation createPath(final String path,
final AbfsRestOperation op = getAbfsRestOperation(
AbfsRestOperationType.PutBlob,
HTTP_METHOD_PUT, url, requestHeaders);
+ op.execute(tracingContext);
+ return op;
+ }
+
+ /**
+ * Checks if the specified path is a directory by listing its contents.
+ *
+ * @param path the path to check.
+ * @param tracingContext the tracing context for the service call.
+ * @return true if the path is a directory and contains entries, false
otherwise.
+ * @throws AzureBlobFileSystemException if the rest operation fails.
+ */
+ private boolean checkDirectoryByList(String path,
+ TracingContext tracingContext)
+ throws AzureBlobFileSystemException {
+ AbfsRestOperation listPathOp = listPath(path, false, 1, null,
+ tracingContext, false);
+ AbfsHttpOperation listPathResult = listPathOp.getResult();
+ if (listPathResult != null) {
+ // Determine if the path is a directory by checking if the list result
schema has any paths
+ return !listPathResult.getListResultSchema().paths().isEmpty();
+ }
+ return false;
+ }
+
+ /**
+ * Checks if the specified path exists as a directory.
+ *
+ * @param path the path of the directory to check.
+ * @param tracingContext the tracing context for the service call.
+ * @return true if the directory exists, false otherwise.
+ * @throws AzureBlobFileSystemException if the rest operation fails.
+ */
+ private boolean checkForDirectoryExistence(String path,
+ TracingContext tracingContext)
+ throws AzureBlobFileSystemException {
+ // Check if the directory contains any entries by listing its contents.
+ if (checkDirectoryByList(path, tracingContext)) {
Review Comment:
Seems like it can be simplified to
`return checkDirectoryByList() || checkEmptyDirectoryPathExists()`
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/BlobRenameHandler.java:
##########
@@ -127,7 +127,7 @@ public boolean execute() throws
AzureBlobFileSystemException {
RenameAtomicity renameAtomicity = null;
if (pathInformation.getIsDirectory()
&& pathInformation.getIsImplicit()) {
- AbfsRestOperation createMarkerOp = getAbfsClient().createPath(
+ AbfsRestOperation createMarkerOp = getAbfsClient().createPathRestOp(
Review Comment:
Same here, why not call createDirectory here directly?
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java:
##########
@@ -555,58 +545,254 @@ public AbfsRestOperation createPath(final String path,
final AbfsRestOperation op = getAbfsRestOperation(
AbfsRestOperationType.PutBlob,
HTTP_METHOD_PUT, url, requestHeaders);
+ op.execute(tracingContext);
+ return op;
+ }
+
+ /**
+ * Checks if the specified path is a directory by listing its contents.
+ *
+ * @param path the path to check.
+ * @param tracingContext the tracing context for the service call.
+ * @return true if the path is a directory and contains entries, false
otherwise.
+ * @throws AzureBlobFileSystemException if the rest operation fails.
+ */
+ private boolean checkDirectoryByList(String path,
Review Comment:
nit: rename function to `checkIsDirectoryPath()` it should be sufficient to
convey the purpose of method I think
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/BlobDeleteHandler.java:
##########
@@ -151,7 +151,7 @@ private void ensurePathParentExist()
throws AzureBlobFileSystemException {
if (!path.isRoot() && !path.getParent().isRoot()) {
try {
- getAbfsClient().createPath(path.getParent().toUri().getPath(),
+ getAbfsClient().createPathRestOp(path.getParent().toUri().getPath(),
Review Comment:
Why are we not calling `createDirectory` here?
> ABFS: [FnsOverBlob][Optimizations] Reduce Network Calls In Create and Mkdir
> Flow
> --------------------------------------------------------------------------------
>
> Key: HADOOP-19448
> URL: https://issues.apache.org/jira/browse/HADOOP-19448
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Affects Versions: 3.4.2
> Reporter: Anuj Modi
> Assignee: Anmol Asrani
> Priority: Major
> Labels: pull-request-available
>
> Implementing Create and Mkdir file system APIs for FNS(HNS Disabled) accounts
> on Blob Endpoint involves a lot of checks and marker file creations to handle
> implicit explicit cases of paths involved in these APIs.
> This Jira proposes a few optimizations to reduce the network calls wherever
> possible and in case where create/mkdir is bound to fail, it should fail
> faster before doing any post checks,
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]