(spark) branch master updated: [SPARK-51989][PYTHON] Add missing Filter subclasses to all list in datasource

allisonwang Tue, 06 May 2025 11:12:42 -0700

This is an automated email from the ASF dual-hosted git repository.

allisonwang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 633419a8c7a3 [SPARK-51989][PYTHON] Add missing Filter subclasses to 
__all__ list in datasource
633419a8c7a3 is described below

commit 633419a8c7a342f7cd93f84e1241adee1e1195f0
Author: Allison Wang <allison.w...@databricks.com>
AuthorDate: Tue May 6 11:12:17 2025 -0700

    [SPARK-51989][PYTHON] Add missing Filter subclasses to __all__ list in 
datasource
    
    ### What changes were proposed in this pull request?
    
    This PR adds missing Filter subclasses to __all__ list in 
pyspark.sql.datasource.
    
    ### Why are the changes needed?
    
    To improve python data source filter pushdown
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Closes #50782 from allisonwang-db/spark-51989-missing-filter.
    
    Authored-by: Allison Wang <allison.w...@databricks.com>
    Signed-off-by: Allison Wang <allison.w...@databricks.com>
---
 python/pyspark/sql/datasource.py | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/datasource.py b/python/pyspark/sql/datasource.py
index 60ae85998c39..c9704ff9f259 100644
--- a/python/pyspark/sql/datasource.py
+++ b/python/pyspark/sql/datasource.py
@@ -53,6 +53,18 @@ __all__ = [
     "WriterCommitMessage",
     "Filter",
     "EqualTo",
+    "EqualNullSafe",
+    "GreaterThan",
+    "GreaterThanOrEqual",
+    "LessThan",
+    "LessThanOrEqual",
+    "In",
+    "IsNull",
+    "IsNotNull",
+    "Not",
+    "StringStartsWith",
+    "StringEndsWith",
+    "StringContains",
 ]
 
 
@@ -966,7 +978,7 @@ class DataSourceWriter(ABC):
 
 class DataSourceArrowWriter(DataSourceWriter):
     """
-    A base class for data source writers that process data using PyArrow’s 
`RecordBatch`.
+    A base class for data source writers that process data using PyArrow's 
`RecordBatch`.
 
     Unlike :class:`DataSourceWriter`, which works with an iterator of Spark 
Rows, this class
     is optimized for using the Arrow format when writing data. It can offer 
better performance


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-51989][PYTHON] Add missing Filter subclasses to __all__ list in datasource

Reply via email to

(spark) branch master updated: [SPARK-51989][PYTHON] Add missing Filter subclasses to all list in datasource