[
https://issues.apache.org/jira/browse/HADOOP-17833?focusedWorklogId=780933&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780933
]
ASF GitHub Bot logged work on HADOOP-17833:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 13/Jun/22 21:03
Start Date: 13/Jun/22 21:03
Worklog Time Spent: 10m
Work Description: mukund-thakur commented on code in PR #3289:
URL: https://github.com/apache/hadoop/pull/3289#discussion_r896054309
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/CommitterTestHelper.java:
##########
@@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.commit;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.util.List;
+
+import org.assertj.core.api.Assertions;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.MultipartTestUtils;
+import org.apache.hadoop.fs.s3a.S3AFileSystem;
+import org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit;
+
+import static java.util.Objects.requireNonNull;
+import static org.apache.hadoop.fs.contract.ContractTestUtils.verifyPathExists;
+import static org.apache.hadoop.fs.s3a.commit.CommitConstants.BASE;
+import static org.apache.hadoop.fs.s3a.commit.CommitConstants.MAGIC;
+import static
org.apache.hadoop.fs.s3a.commit.CommitConstants.STREAM_CAPABILITY_MAGIC_OUTPUT;
+import static org.apache.hadoop.fs.s3a.commit.CommitConstants.XA_MAGIC_MARKER;
+import static
org.apache.hadoop.fs.s3a.commit.impl.CommitOperations.extractMagicFileLength;
+
+/**
+ * Helper for committer tests: extra assertions and the like.
+ */
+public class CommitterTestHelper {
+
+ private static final Logger LOG =
+ LoggerFactory.getLogger(CommitterTestHelper.class);
+
+ /**
+ * Filesystem under test.
+ */
+ private final S3AFileSystem fileSystem;
+
+ /**
+ * Constructor.
+ * @param fileSystem filesystem to work with.
+ */
+ public CommitterTestHelper(S3AFileSystem fileSystem) {
+ this.fileSystem = requireNonNull(fileSystem);
+ }
+
+ /**
+ * Get the filesystem.
+ * @return the filesystem.
+ */
+ public S3AFileSystem getFileSystem() {
+ return fileSystem;
+ }
+
+ /**
+ * Verify that the path at the end of a commit exists.
+ * This does not validate the size.
+ * @param commit commit to verify
+ * @throws FileNotFoundException dest doesn't exist
+ * @throws ValidationFailure commit arg is invalid
+ * @throws IOException invalid commit, IO failure
+ */
+ public void verifyCommitExists(SinglePendingCommit commit)
Review Comment:
This is an unused method.
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/impl/CommitOperations.java:
##########
@@ -592,13 +634,41 @@ public void jobCompleted(boolean success) {
}
/**
- * Begin the final commit.
+ * Crate a commit context for a job or task.
Review Comment:
nit: create
##########
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdataoutputstreambuilder.md:
##########
@@ -182,3 +182,58 @@ see `FileSystem#create(path, ...)` and
`FileSystem#append()`.
result = FSDataOutputStream
The result is `FSDataOutputStream` to be used to write data to filesystem.
+
+
+## <a name="s3a"></a> S3A-specific options
+
+Here are the custom options which the S3A Connector supports.
+
+| Name | Type | Meaning
|
+|-----------------------------|-----------|----------------------------------------|
+| `fs.s3a.create.performance` | `boolean` | create a file with maximum
performance |
+| `fs.s3a.create.header` | `string` | prefix for user supplied headers
|
+
+### `fs.s3a.create.performance`
+
+Prioritize file creation performance over safety checks for filesystem
consistency.
+
+This:
+1. Skips the `LIST` call which makes sure a file is being created over a
directory.
+ Risk: a file is created over a directory.
+1. Ignores the overwrite flag.
+1. Never issues a `DELETE` call to delete parent directory markers.
+
+It is possible to probe an S3A Filesystem instance for this capability through
+the `hasPathCapability(path, "fs.s3a.create.performance")` check.
+
+Creating files with this option over existing directories is likely
+to make S3A filesystem clients behave inconsistently.
+
+Operations optimized for directories (e.g. listing calls) are likely
+to see the directory tree not the file; operations optimized for
+files (`getFileStatus()`, `isFile()`) more likely to see the file.
+The exact form of the inconsistencies, and which operations/parameters
+trigger this are undefined and may change between even minor releases.
+
+Using this option is the equivalent of pressing and holding down the
+"Electronic Stability Control"
+button on a rear-wheel drive car for five seconds: the safety checks are off.
+Things wil be faster if the driver knew what they were doing.
+If they didn't, the fact they had held the button down will
+be used as evidence at the inquest as proof that they made a
+conscious decision to choose speed over safety and
+that the outcome was their own fault.
+
+Accordingly: *Use if and only if you are confident that the conditions are
met.*
Review Comment:
Initially I was worried about inconsistencies leading to escalations but by
the end I think we are clear enough. Nice doc.
Issue Time Tracking
-------------------
Worklog Id: (was: 780933)
Time Spent: 11h 50m (was: 11h 40m)
> Improve Magic Committer Performance
> -----------------------------------
>
> Key: HADOOP-17833
> URL: https://issues.apache.org/jira/browse/HADOOP-17833
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs/s3
> Affects Versions: 3.3.1
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 11h 50m
> Remaining Estimate: 0h
>
> Magic committer tasks can be slow because every file created with
> overwrite=false triggers a HEAD (verify there's no file) and a LIST (that
> there's no dir). And because of delayed manifestations, it may not behave as
> expected.
> ParquetOutputFormat is one example of a library which does this.
> we could fix parquet to use overwrite=true, but (a) there may be surprises in
> other uses (b) it'd still leave the list and (c) do nothing for other formats
> call
> Proposed: createFile() under a magic path to skip all probes for file/dir at
> end of path
> Only a single task attempt Will be writing to that directory and it should
> know what it is doing. If there is conflicting file names and parts across
> tasks that won't even get picked up at this point. Oh and none of the
> committers ever check for this: you'll get the last file manifested (s3a) or
> renamed (file)
> If we skip the checks we will save 2 HTTP requests/file.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]