[ 
https://issues.apache.org/jira/browse/HADOOP-18097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-18097:
---------------------------------------

    Assignee: Balaji Ganesan

> StagingCommitter getFinalKey method can add an extra / if getS3KeyPrefix 
> returns ""
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-18097
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18097
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.3.1
>         Environment: apache-spark 3.2 with hadoop 3.3.1 on Ubuntu 20.04
>  
>  
>  
>            Reporter: Balaji Ganesan
>            Assignee: Balaji Ganesan
>            Priority: Minor
>         Attachments: error.log
>
>
> I am trying to test staging committer against an on prem object store using 
> spark terasort and ran into this issue. All my initiate MPU were failing with 
> S3 error key not found. This object store doesn't support virtual host style 
> request, so I had path style enabled. After adding some extra debug and 
> building hadoop-aws locally, I found that staging committer was always adding 
> a '/' prefix to my key.  
>  
> So instead of part part-r-00000-4ead11c8-bc20-4dee-9753-1b1f1ae4e578 I would 
> end up with /part-r-00000-4ead11c8-bc20-4dee-9753-1b1f1ae4e578. I traced it 
> to getFinalKey in StagingCommitter.java which had the following code
>  
>  *      return getS3KeyPrefix(context) + "/"
> -          + Paths.addUUID(relative, getUUID());
> If getS3KeyPrefix(context) is "", then we end up with /part-r... as the key. 
>  
> I made the following change locally and was able to resolve the issue
>  
>  
> ---
> diff --git 
> a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java
>  
> b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java
> index 59114f7ab73..6d76cf2d419 100644
> --- 
> a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java
> +++ 
> b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java
> @@ -365,11 +365,16 @@ public Path getTempTaskAttemptPath(TaskAttemptContext 
> context) {
>     * @return the S3 key where the file will be uploaded
>     */
>    protected String getFinalKey(String relative, JobContext context) {
> +    StringBuilder sb = new StringBuilder();
> +    final String pfx = getS3KeyPrefix(context);
> +    if (!pfx.isEmpty()) {
> +        sb.append(pfx).append('/');
> +    }
> +
>      if (uniqueFilenames) {
> -      return getS3KeyPrefix(context) + "/"
> -          + Paths.addUUID(relative, getUUID());
> +        return sb.append(Paths.addUUID(relative, getUUID())).toString(); 
>      } else {
> -      return getS3KeyPrefix(context) + "/" + relative;
> +        return sb.append(relative).toString();
>      }
>    }



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to