[jira] [Commented] (SPARK-21786) The 'spark.sql.parquet.compression.codec' configuration doesn't take effect on tables with partition field(s)

Apache Spark (JIRA) Fri, 31 Aug 2018 01:42:33 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-21786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598432#comment-16598432
 ]


Apache Spark commented on SPARK-21786:
--------------------------------------

User 'fjh100456' has created a pull request for this issue:
https://github.com/apache/spark/pull/22301

> The 'spark.sql.parquet.compression.codec' configuration doesn't take effect 
> on tables with partition field(s)
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21786
>                 URL: https://issues.apache.org/jira/browse/SPARK-21786
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Jinhua Fu
>            Assignee: Jinhua Fu
>            Priority: Major
>             Fix For: 2.3.0
>
>
> Since Hive 1.1, Hive allows users to set parquet compression codec via 
> table-level properties parquet.compression. See the JIRA: 
> https://issues.apache.org/jira/browse/HIVE-7858 . We do support 
> orc.compression for ORC. Thus, for external users, it is more straightforward 
> to support both. See the stackflow question: 
> https://stackoverflow.com/questions/36941122/spark-sql-ignores-parquet-compression-propertie-specified-in-tblproperties
> In Spark side, our table-level compression conf compression was added by 
> #11464 since Spark 2.0.
> We need to support both table-level conf. Users might also use session-level 
> conf spark.sql.parquet.compression.codec. The priority rule will be like
> If other compression codec configuration was found through hive or parquet, 
> the precedence would be compression, parquet.compression, 
> spark.sql.parquet.compression.codec. Acceptable values include: none, 
> uncompressed, snappy, gzip, lzo.
> The rule for Parquet is consistent with the ORC after the change.
> Changes:
> 1.Increased acquiring 'compressionCodecClassName' from 
> parquet.compression,and the precedence order is 
> compression,parquet.compression,spark.sql.parquet.compression.codec, just 
> like what we do in OrcOptions.
> 2.Change spark.sql.parquet.compression.codec to support "none".Actually in 
> ParquetOptions,we do support "none" as equivalent to "uncompressed", but it 
> does not allowed to configured to "none".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-21786) The 'spark.sql.parquet.compression.codec' configuration doesn't take effect on tables with partition field(s)

Reply via email to