This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 9da1e4ca7bb [SPARK-46290][PYTHON] Change saveMode to a boolean flag
for DataSourceWriter
9da1e4ca7bb is described below
commit 9da1e4ca7bb89a8b5730d9e496c378c8357e003a
Author: allisonwang-db <[email protected]>
AuthorDate: Thu Dec 7 09:42:04 2023 +0900
[SPARK-46290][PYTHON] Change saveMode to a boolean flag for DataSourceWriter
### What changes were proposed in this pull request?
This PR updates the `writer` method in the Python data source API from
```
def writer(self, schema: StructType, saveMode: str)
```
to
```
def writer(self, schema: StructType, overwrite: bool)
```
The motivation here is that `saveMode` offers four modes: append,
overwrite, error, and ignore, but practically speaking, only append and
overwrite are meaningful. Also, DSv2 only supports the append and overwrite
mode. Python data sources should be consistent.
### Why are the changes needed?
To make the API simpler.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing tests
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #44216 from allisonwang-db/spark-46290-overwrite.
Authored-by: allisonwang-db <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/pyspark/sql/datasource.py | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/python/pyspark/sql/datasource.py b/python/pyspark/sql/datasource.py
index 4713ca5366a..e20d44039a6 100644
--- a/python/pyspark/sql/datasource.py
+++ b/python/pyspark/sql/datasource.py
@@ -130,7 +130,7 @@ class DataSource(ABC):
message_parameters={"feature": "reader"},
)
- def writer(self, schema: StructType, saveMode: str) -> "DataSourceWriter":
+ def writer(self, schema: StructType, overwrite: bool) ->
"DataSourceWriter":
"""
Returns a ``DataSourceWriter`` instance for writing data.
@@ -140,9 +140,8 @@ class DataSource(ABC):
----------
schema : StructType
The schema of the data to be written.
- saveMode : str
- A string identifies the save mode. It can be one of the following:
- `append`, `overwrite`, `error`, `ignore`.
+ overwrite : bool
+ A flag indicating whether to overwrite existing data when writing
to the data source.
Returns
-------
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]