[
https://issues.apache.org/jira/browse/HADOOP-17112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152089#comment-17152089
]
Steve Loughran commented on HADOOP-17112:
-----------------------------------------
Looks like a marshalling bug in the creation of SinglePendingCommit file in
CommitOperations.uploadFileToPendingCommit()
path.toString() is used to create the string to save, when it should be
toUri.toString
There is no way I'm going to go near this code in the next week, and even if I
did I would be left trying to chase down a reviewer.
Do you fancy having a go at it? A new test should go into
ITestCommitOperations and the hadoop-aws patch policy "tell us the AWS region
you ran the module's 'mvn verify' suite will apply", I'm afraid.
> whitespace not allowed in paths when saving files to s3a via committer
> ----------------------------------------------------------------------
>
> Key: HADOOP-17112
> URL: https://issues.apache.org/jira/browse/HADOOP-17112
> Project: Hadoop Common
> Issue Type: Sub-task
> Affects Versions: 3.2.0
> Reporter: Krzysztof Adamski
> Priority: Major
> Attachments: image-2020-07-03-16-08-52-340.png
>
>
> When saving results through spark dataframe on latest 3.0.1-snapshot compiled
> against hadoop-3.2 with the following specs
> --conf
> spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
>
> --conf
> spark.sql.parquet.output.committer.class=org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter
>
> --conf
> spark.sql.sources.commitProtocolClass=org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
>
> --conf spark.hadoop.fs.s3a.committer.name=partitioned
> --conf spark.hadoop.fs.s3a.committer.staging.conflict-mode=replace
> we are unable to save the file with whitespace character in the path. It
> works fine without.
> I was looking into the recent commits with regards to qualifying the path,
> but couldn't find anything obvious. Is this a known bug?
> When saving results through spark dataframe on latest 3.0.1-snapshot compiled
> against hadoop-3.2 with the following specs
> --conf
> spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
>
> --conf
> spark.sql.parquet.output.committer.class=org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter
>
> --conf
> spark.sql.sources.commitProtocolClass=org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
>
> --conf spark.hadoop.fs.s3a.committer.name=partitioned
> --conf spark.hadoop.fs.s3a.committer.staging.conflict-mode=replace
> we are unable to save the file with whitespace character in the path. It
> works fine without.
> I was looking into the recent commits with regards to qualifying the path,
> but couldn't find anything obvious. Is this a known bug?
> !image-2020-07-03-16-08-52-340.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]