spark git commit: [SPARK-25366][SQL] Zstd and brotli CompressionCodec are not supported for parquet files

srowen Thu, 20 Sep 2018 14:54:02 -0700

Repository: spark
Updated Branches:
  refs/heads/master 2f51e7235 -> 4d114fc9a



[SPARK-25366][SQL] Zstd and brotli CompressionCodec are not supported for 
parquet files

## What changes were proposed in this pull request?
Hadoop2.6  and  hadoop2.7 do not contain zstd and brotli compressioncodec 
,hadoop 3.1 also contains only zstd  compressioncodec .
 So I think we should remove zstd and brotil  for the time being.

**set  `spark.sql.parquet.compression.codec=brotli`:**
Caused by: org.apache.parquet.hadoop.BadConfigurationException: Class 
org.apache.hadoop.io.compress.BrotliCodec was not found
        at 
org.apache.parquet.hadoop.CodecFactory.getCodec(CodecFactory.java:235)
        at 
org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.<init>(CodecFactory.java:142)
        at 
org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:206)
        at 
org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:189)
        at 
org.apache.parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:153)
        at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411)
        at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
        at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
        at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:161)

**set  `spark.sql.parquet.compression.codec=zstd`:**
Caused by: org.apache.parquet.hadoop.BadConfigurationException: Class 
org.apache.hadoop.io.compress.ZStandardCodec was not found
        at 
org.apache.parquet.hadoop.CodecFactory.getCodec(CodecFactory.java:235)
        at 
org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.<init>(CodecFactory.java:142)
        at 
org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:206)
        at 
org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:189)
        at 
org.apache.parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:153)
        at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411)
        at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
        at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
        at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:161)

## How was this patch tested?
Exist unit test

Closes #22358 from 10110346/notsupportzstdandbrotil.

Authored-by: liuxian <[email protected]>
Signed-off-by: Sean Owen <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4d114fc9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4d114fc9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4d114fc9

Branch: refs/heads/master
Commit: 4d114fc9a2cb0be7256560bc8b2e4ce72adb7a7f
Parents: 2f51e72
Author: liuxian <[email protected]>
Authored: Thu Sep 20 16:53:48 2018 -0500
Committer: Sean Owen <[email protected]>
Committed: Thu Sep 20 16:53:48 2018 -0500

----------------------------------------------------------------------
 docs/sql-programming-guide.md | 2 ++
 1 file changed, 2 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/4d114fc9/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index d2e3ee3..8ec4865 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -965,6 +965,8 @@ Configuration of Parquet can be done using the `setConf` 
method on `SparkSession
     `parquet.compression` is specified in the table-specific 
options/properties, the precedence would be
     `compression`, `parquet.compression`, 
`spark.sql.parquet.compression.codec`. Acceptable values include:
     none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
+    Note that `zstd` requires `ZStandardCodec` to be installed before Hadoop 
2.9.0, `brotli` requires
+    `BrotliCodec` to be installed.
   </td>
 </tr>
 <tr>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-25366][SQL] Zstd and brotli CompressionCodec are not supported for parquet files

Reply via email to