[
https://issues.apache.org/jira/browse/HADOOP-18097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran reassigned HADOOP-18097:
---------------------------------------
Assignee: Balaji Ganesan
> StagingCommitter getFinalKey method can add an extra / if getS3KeyPrefix
> returns ""
> -----------------------------------------------------------------------------------
>
> Key: HADOOP-18097
> URL: https://issues.apache.org/jira/browse/HADOOP-18097
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.3.1
> Environment: apache-spark 3.2 with hadoop 3.3.1 on Ubuntu 20.04
>
>
>
> Reporter: Balaji Ganesan
> Assignee: Balaji Ganesan
> Priority: Minor
> Attachments: error.log
>
>
> I am trying to test staging committer against an on prem object store using
> spark terasort and ran into this issue. All my initiate MPU were failing with
> S3 error key not found. This object store doesn't support virtual host style
> request, so I had path style enabled. After adding some extra debug and
> building hadoop-aws locally, I found that staging committer was always adding
> a '/' prefix to my key.
>
> So instead of part part-r-00000-4ead11c8-bc20-4dee-9753-1b1f1ae4e578 I would
> end up with /part-r-00000-4ead11c8-bc20-4dee-9753-1b1f1ae4e578. I traced it
> to getFinalKey in StagingCommitter.java which had the following code
>
> * return getS3KeyPrefix(context) + "/"
> - + Paths.addUUID(relative, getUUID());
> If getS3KeyPrefix(context) is "", then we end up with /part-r... as the key.
>
> I made the following change locally and was able to resolve the issue
>
>
> ---
> diff --git
> a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java
>
> b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java
> index 59114f7ab73..6d76cf2d419 100644
> ---
> a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java
> +++
> b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java
> @@ -365,11 +365,16 @@ public Path getTempTaskAttemptPath(TaskAttemptContext
> context) {
> * @return the S3 key where the file will be uploaded
> */
> protected String getFinalKey(String relative, JobContext context) {
> + StringBuilder sb = new StringBuilder();
> + final String pfx = getS3KeyPrefix(context);
> + if (!pfx.isEmpty()) {
> + sb.append(pfx).append('/');
> + }
> +
> if (uniqueFilenames) {
> - return getS3KeyPrefix(context) + "/"
> - + Paths.addUUID(relative, getUUID());
> + return sb.append(Paths.addUUID(relative, getUUID())).toString();
> } else {
> - return getS3KeyPrefix(context) + "/" + relative;
> + return sb.append(relative).toString();
> }
> }
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]