[ 
https://issues.apache.org/jira/browse/HADOOP-17833?focusedWorklogId=781333&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781333
 ]

ASF GitHub Bot logged work on HADOOP-17833:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Jun/22 18:09
            Start Date: 14/Jun/22 18:09
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on code in PR #3289:
URL: https://github.com/apache/hadoop/pull/3289#discussion_r897142097


##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/CommitterTestHelper.java:
##########
@@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.commit;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.util.List;
+
+import org.assertj.core.api.Assertions;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.MultipartTestUtils;
+import org.apache.hadoop.fs.s3a.S3AFileSystem;
+import org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit;
+
+import static java.util.Objects.requireNonNull;
+import static org.apache.hadoop.fs.contract.ContractTestUtils.verifyPathExists;
+import static org.apache.hadoop.fs.s3a.commit.CommitConstants.BASE;
+import static org.apache.hadoop.fs.s3a.commit.CommitConstants.MAGIC;
+import static 
org.apache.hadoop.fs.s3a.commit.CommitConstants.STREAM_CAPABILITY_MAGIC_OUTPUT;
+import static org.apache.hadoop.fs.s3a.commit.CommitConstants.XA_MAGIC_MARKER;
+import static 
org.apache.hadoop.fs.s3a.commit.impl.CommitOperations.extractMagicFileLength;
+
+/**
+ * Helper for committer tests: extra assertions and the like.
+ */
+public class CommitterTestHelper {
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(CommitterTestHelper.class);
+
+  /**
+   * Filesystem under test.
+   */
+  private final S3AFileSystem fileSystem;
+
+  /**
+   * Constructor.
+   * @param fileSystem filesystem to work with.
+   */
+  public CommitterTestHelper(S3AFileSystem fileSystem) {
+    this.fileSystem = requireNonNull(fileSystem);
+  }
+
+  /**
+   * Get the filesystem.
+   * @return the filesystem.
+   */
+  public S3AFileSystem getFileSystem() {
+    return fileSystem;
+  }
+
+  /**
+   * Verify that the path at the end of a commit exists.
+   * This does not validate the size.
+   * @param commit commit to verify
+   * @throws FileNotFoundException dest doesn't exist
+   * @throws ValidationFailure commit arg is invalid
+   * @throws IOException invalid commit, IO failure
+   */
+  public void verifyCommitExists(SinglePendingCommit commit)

Review Comment:
   cut it





Issue Time Tracking
-------------------

    Worklog Id:     (was: 781333)
    Time Spent: 12h 10m  (was: 12h)

> Improve Magic Committer Performance
> -----------------------------------
>
>                 Key: HADOOP-17833
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17833
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 3.3.1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> Magic committer tasks can be slow because every file created with 
> overwrite=false triggers a HEAD (verify there's no file) and a LIST (that 
> there's no dir). And because of delayed manifestations, it may not behave as 
> expected.
> ParquetOutputFormat is one example of a library which does this.
> we could fix parquet to use overwrite=true, but (a) there may be surprises in 
> other uses (b) it'd still leave the list and (c) do nothing for other formats 
> call
> Proposed: createFile() under a magic path to skip all probes for file/dir at 
> end of path
> Only a single task attempt Will be writing to that directory and it should 
> know what it is doing. If there is conflicting file names and parts across 
> tasks that won't even get picked up at this point. Oh and none of the 
> committers ever check for this: you'll get the last file manifested (s3a) or 
> renamed (file)
> If we skip the checks we will save 2 HTTP requests/file.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to