[
https://issues.apache.org/jira/browse/HADOOP-19072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833617#comment-17833617
]
ASF GitHub Bot commented on HADOOP-19072:
-----------------------------------------
steveloughran commented on code in PR #6543:
URL: https://github.com/apache/hadoop/pull/6543#discussion_r1549880873
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/MkdirOperation.java:
##########
@@ -124,7 +132,32 @@ public Boolean execute() throws IOException {
return true;
}
- // Walk path to root, ensuring closest ancestor is a directory, not file
+ // if performance creation mode is set, no need to check
Review Comment:
i was wrong; now the patch is in I see where I was mistaken. Its the
versioned buckets where problems surface. sorry!
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/MkdirOperation.java:
##########
@@ -98,14 +107,13 @@ public Boolean execute() throws IOException {
}
// get the file status of the path.
- // this is done even for a magic path, to avoid always issuing PUT
- // requests. Doing that without a check wouild seem to be an
- // optimization, but it is not because
- // 1. PUT is slower than HEAD
- // 2. Write capacity is less than read capacity on a shard
- // 3. It adds needless entries in versioned buckets, slowing
- // down subsequent operations.
- FileStatus fileStatus = getPathStatusExpectingDir(dir);
+ // this is not done for magic path i.e. performanceCreation mode.
+ // For performanceCreation mode, we would probe for HEAD only.
+ // For non-performance or regular mode, the probe for both HEAD and LIST
would
+ // be done.
+ S3AFileStatus fileStatus = performanceCreation
+ ? probePathStatusOrNull(dir, StatusProbeEnum.HEAD_ONLY)
Review Comment:
this will trigger a needless PUT if there isn't a marker, just children,
which hits all the problems in the comments of the existing code.
afrid we need to revert to the old code.
> S3A: expand optimisations on stores with "fs.s3a.create.performance"
> --------------------------------------------------------------------
>
> Key: HADOOP-19072
> URL: https://issues.apache.org/jira/browse/HADOOP-19072
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.4.0
> Reporter: Steve Loughran
> Assignee: Viraj Jasani
> Priority: Major
> Labels: pull-request-available
>
> on an s3a store with fs.s3a.create.performance set, speed up other operations
> * mkdir to skip parent directory check: just do a HEAD to see if there's a
> file at the target location
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]