[
https://issues.apache.org/jira/browse/SPARK-50537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-50537:
-----------------------------------
Labels: pull-request-available (was: )
> Fix compression option being overwritten in df.write.parquet in SparkConnect
> Python
> -----------------------------------------------------------------------------------
>
> Key: SPARK-50537
> URL: https://issues.apache.org/jira/browse/SPARK-50537
> Project: Spark
> Issue Type: Bug
> Components: Connect
> Affects Versions: 3.4.2, 3.4.0, 3.4.1, 3.5.0, 4.0.0, 3.5.1, 3.5.2, 3.4.3,
> 3.4.4, 3.5.3, 3.5.4
> Reporter: Alex Khakhlyuk
> Priority: Minor
> Labels: pull-request-available
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> There is a small bug in Spark Connect's {{{}DataFrameWriter{}}}.
>
> df.write.option("compression", "gzip").parquet(path)
> When this code is used, the specified compression option "gzip" gets
> overwritten by None. This happens because {{parquet()}} function has a
> default {{compression=None}} parameter which is used to set the compression
> option directly
> with {{{}self.option("compression", compression){}}}.
> The Spark Connect server then receives a request without a specified
> compression option and uses "snappy" compression by default instead.
> The fix is to use the {{{}self._set_opts(compression=compression){}}}. This
> method of setting options is used by most other {{DataFrameWriter}} APIs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]