Repository: spark
Updated Branches:
refs/heads/master 2f51e7235 -> 4d114fc9a
[SPARK-25366][SQL] Zstd and brotli CompressionCodec are not supported for
parquet files
## What changes were proposed in this pull request?
Hadoop2.6 and hadoop2.7 do not contain zstd and brotli compressioncodec
,hadoop 3.1 also contains only zstd compressioncodec .
So I think we should remove zstd and brotil for the time being.
**set `spark.sql.parquet.compression.codec=brotli`:**
Caused by: org.apache.parquet.hadoop.BadConfigurationException: Class
org.apache.hadoop.io.compress.BrotliCodec was not found
at
org.apache.parquet.hadoop.CodecFactory.getCodec(CodecFactory.java:235)
at
org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.<init>(CodecFactory.java:142)
at
org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:206)
at
org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:189)
at
org.apache.parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:153)
at
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411)
at
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:161)
**set `spark.sql.parquet.compression.codec=zstd`:**
Caused by: org.apache.parquet.hadoop.BadConfigurationException: Class
org.apache.hadoop.io.compress.ZStandardCodec was not found
at
org.apache.parquet.hadoop.CodecFactory.getCodec(CodecFactory.java:235)
at
org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.<init>(CodecFactory.java:142)
at
org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:206)
at
org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:189)
at
org.apache.parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:153)
at
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411)
at
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:161)
## How was this patch tested?
Exist unit test
Closes #22358 from 10110346/notsupportzstdandbrotil.
Authored-by: liuxian <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4d114fc9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4d114fc9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4d114fc9
Branch: refs/heads/master
Commit: 4d114fc9a2cb0be7256560bc8b2e4ce72adb7a7f
Parents: 2f51e72
Author: liuxian <[email protected]>
Authored: Thu Sep 20 16:53:48 2018 -0500
Committer: Sean Owen <[email protected]>
Committed: Thu Sep 20 16:53:48 2018 -0500
----------------------------------------------------------------------
docs/sql-programming-guide.md | 2 ++
1 file changed, 2 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/4d114fc9/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index d2e3ee3..8ec4865 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -965,6 +965,8 @@ Configuration of Parquet can be done using the `setConf`
method on `SparkSession
`parquet.compression` is specified in the table-specific
options/properties, the precedence would be
`compression`, `parquet.compression`,
`spark.sql.parquet.compression.codec`. Acceptable values include:
none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
+ Note that `zstd` requires `ZStandardCodec` to be installed before Hadoop
2.9.0, `brotli` requires
+ `BrotliCodec` to be installed.
</td>
</tr>
<tr>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]