[
https://issues.apache.org/jira/browse/HADOOP-17112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Krzysztof Adamski updated HADOOP-17112:
---------------------------------------
Description:
When saving results through spark dataframe on latest 3.0.1-snapshot compiled
against hadoop-3.2 with the following specs
--conf
spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
--conf
spark.sql.parquet.output.committer.class=org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter
--conf
spark.sql.sources.commitProtocolClass=org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
--conf spark.hadoop.fs.s3a.committer.name=partitioned
--conf spark.hadoop.fs.s3a.committer.staging.conflict-mode=replace
we are unable to save the file with whitespace character in the path. It works
fine without.
I was looking into the recent commits with regards to qualifying the path, but
couldn't find anything obvious. Is this a known bug?
When saving results through spark dataframe on latest 3.0.1-snapshot compiled
against hadoop-3.2 with the following specs
--conf
spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
--conf
spark.sql.parquet.output.committer.class=org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter
--conf
spark.sql.sources.commitProtocolClass=org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
--conf spark.hadoop.fs.s3a.committer.name=partitioned
--conf spark.hadoop.fs.s3a.committer.staging.conflict-mode=replace
we are unable to save the file with whitespace character in the path. It works
fine without.
I was looking into the recent commits with regards to qualifying the path, but
couldn't find anything obvious. Is this a known bug?
!image-2020-07-03-16-08-52-340.png!
was:
When saving results through spark dataframe on latest 3.0.1-snapshot compiled
against hadoop-3.2 with the following specs
--conf
spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
--conf
spark.sql.parquet.output.committer.class=org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter
--conf
spark.sql.sources.commitProtocolClass=org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
--conf spark.hadoop.fs.s3a.committer.name=partitioned
--conf spark.hadoop.fs.s3a.committer.staging.conflict-mode=replace
we are unable to save the file with whitespace character in the path. It works
fine without.
I was looking into the recent commits with regards to qualifying the path, but
couldn't find anything obvious. Is this a known bug?
!image-2020-07-03-16-08-15-852.png!
> whitespace not allowed in paths when saving files to s3a via committer
> ----------------------------------------------------------------------
>
> Key: HADOOP-17112
> URL: https://issues.apache.org/jira/browse/HADOOP-17112
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 3.2.0
> Reporter: Krzysztof Adamski
> Priority: Major
> Attachments: image-2020-07-03-16-08-52-340.png
>
>
> When saving results through spark dataframe on latest 3.0.1-snapshot compiled
> against hadoop-3.2 with the following specs
> --conf
> spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
>
> --conf
> spark.sql.parquet.output.committer.class=org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter
>
> --conf
> spark.sql.sources.commitProtocolClass=org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
>
> --conf spark.hadoop.fs.s3a.committer.name=partitioned
> --conf spark.hadoop.fs.s3a.committer.staging.conflict-mode=replace
> we are unable to save the file with whitespace character in the path. It
> works fine without.
> I was looking into the recent commits with regards to qualifying the path,
> but couldn't find anything obvious. Is this a known bug?
> When saving results through spark dataframe on latest 3.0.1-snapshot compiled
> against hadoop-3.2 with the following specs
> --conf
> spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
>
> --conf
> spark.sql.parquet.output.committer.class=org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter
>
> --conf
> spark.sql.sources.commitProtocolClass=org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
>
> --conf spark.hadoop.fs.s3a.committer.name=partitioned
> --conf spark.hadoop.fs.s3a.committer.staging.conflict-mode=replace
> we are unable to save the file with whitespace character in the path. It
> works fine without.
> I was looking into the recent commits with regards to qualifying the path,
> but couldn't find anything obvious. Is this a known bug?
> !image-2020-07-03-16-08-52-340.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]