[ 
https://issues.apache.org/jira/browse/SPARK-50537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-50537:
-----------------------------------
    Labels: pull-request-available  (was: )

> Fix compression option being overwritten in df.write.parquet in SparkConnect 
> Python
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-50537
>                 URL: https://issues.apache.org/jira/browse/SPARK-50537
>             Project: Spark
>          Issue Type: Bug
>          Components: Connect
>    Affects Versions: 3.4.2, 3.4.0, 3.4.1, 3.5.0, 4.0.0, 3.5.1, 3.5.2, 3.4.3, 
> 3.4.4, 3.5.3, 3.5.4
>            Reporter: Alex Khakhlyuk
>            Priority: Minor
>              Labels: pull-request-available
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> There is a small bug in Spark Connect's {{{}DataFrameWriter{}}}.
>  
> df.write.option("compression", "gzip").parquet(path)
> When this code is used, the specified compression option "gzip" gets 
> overwritten by None. This happens because {{parquet()}} function has a 
> default {{compression=None}} parameter which is used to set the compression 
> option directly
> with {{{}self.option("compression", compression){}}}.
> The Spark Connect server then receives a request without a specified 
> compression option and uses "snappy" compression by default instead.
> The fix is to use the {{{}self._set_opts(compression=compression){}}}. This 
> method of setting options is used by most other {{DataFrameWriter}} APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to