This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 9da1e4ca7bb [SPARK-46290][PYTHON] Change saveMode to a boolean flag 
for DataSourceWriter
9da1e4ca7bb is described below

commit 9da1e4ca7bb89a8b5730d9e496c378c8357e003a
Author: allisonwang-db <allison.w...@databricks.com>
AuthorDate: Thu Dec 7 09:42:04 2023 +0900

    [SPARK-46290][PYTHON] Change saveMode to a boolean flag for DataSourceWriter
    
    ### What changes were proposed in this pull request?
    
    This PR updates the `writer` method in the Python data source API from
    ```
    def writer(self, schema: StructType, saveMode: str)
    ```
    to
    ```
    def writer(self, schema: StructType, overwrite: bool)
    ```
    The motivation here is that `saveMode` offers four modes: append, 
overwrite, error, and ignore, but practically speaking, only append and 
overwrite are meaningful. Also, DSv2 only supports the append and overwrite 
mode. Python data sources should be consistent.
    
    ### Why are the changes needed?
    
    To make the API simpler.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    Existing tests
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #44216 from allisonwang-db/spark-46290-overwrite.
    
    Authored-by: allisonwang-db <allison.w...@databricks.com>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 python/pyspark/sql/datasource.py | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/python/pyspark/sql/datasource.py b/python/pyspark/sql/datasource.py
index 4713ca5366a..e20d44039a6 100644
--- a/python/pyspark/sql/datasource.py
+++ b/python/pyspark/sql/datasource.py
@@ -130,7 +130,7 @@ class DataSource(ABC):
             message_parameters={"feature": "reader"},
         )
 
-    def writer(self, schema: StructType, saveMode: str) -> "DataSourceWriter":
+    def writer(self, schema: StructType, overwrite: bool) -> 
"DataSourceWriter":
         """
         Returns a ``DataSourceWriter`` instance for writing data.
 
@@ -140,9 +140,8 @@ class DataSource(ABC):
         ----------
         schema : StructType
             The schema of the data to be written.
-        saveMode : str
-            A string identifies the save mode. It can be one of the following:
-            `append`, `overwrite`, `error`, `ignore`.
+        overwrite : bool
+            A flag indicating whether to overwrite existing data when writing 
to the data source.
 
         Returns
         -------


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to