[jira] [Created] (SPARK-50537) Fix compression option being overwritten in df.write.parquet in SparkConnect Python

Alex Khakhlyuk (Jira) Tue, 10 Dec 2024 09:57:15 -0800

Alex Khakhlyuk created SPARK-50537:
--------------------------------------

             Summary: Fix compression option being overwritten in 
df.write.parquet in SparkConnect Python
                 Key: SPARK-50537
                 URL: https://issues.apache.org/jira/browse/SPARK-50537
             Project: Spark
          Issue Type: Bug
          Components: Connect
    Affects Versions: 3.5.3, 3.4.4, 3.4.3, 3.5.2, 3.5.1, 3.5.0, 3.4.1, 3.4.0, 
3.4.2, 4.0.0, 3.5.4
            Reporter: Alex Khakhlyuk



There is a small bug in Spark Connect's {{{}DataFrameWriter{}}}.

 
df.write.option("compression", "gzip").parquet(path)
When this code is used, the specified compression option "gzip" gets 
overwritten by None. This happens because {{parquet()}} function has a default 
{{compression=None}} parameter which is used to set the compression option 
directly
with {{{}self.option("compression", compression){}}}.
The Spark Connect server then receives a request without a specified 
compression option and uses "snappy" compression by default instead.
The fix is to use the {{{}self._set_opts(compression=compression){}}}. This 
method of setting options is used by most other {{DataFrameWriter}} APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-50537) Fix compression option being overwritten in df.write.parquet in SparkConnect Python

Reply via email to