This is an automated email from the ASF dual-hosted git repository. allisonwang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 633419a8c7a3 [SPARK-51989][PYTHON] Add missing Filter subclasses to __all__ list in datasource 633419a8c7a3 is described below commit 633419a8c7a342f7cd93f84e1241adee1e1195f0 Author: Allison Wang <allison.w...@databricks.com> AuthorDate: Tue May 6 11:12:17 2025 -0700 [SPARK-51989][PYTHON] Add missing Filter subclasses to __all__ list in datasource ### What changes were proposed in this pull request? This PR adds missing Filter subclasses to __all__ list in pyspark.sql.datasource. ### Why are the changes needed? To improve python data source filter pushdown ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? Closes #50782 from allisonwang-db/spark-51989-missing-filter. Authored-by: Allison Wang <allison.w...@databricks.com> Signed-off-by: Allison Wang <allison.w...@databricks.com> --- python/pyspark/sql/datasource.py | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/python/pyspark/sql/datasource.py b/python/pyspark/sql/datasource.py index 60ae85998c39..c9704ff9f259 100644 --- a/python/pyspark/sql/datasource.py +++ b/python/pyspark/sql/datasource.py @@ -53,6 +53,18 @@ __all__ = [ "WriterCommitMessage", "Filter", "EqualTo", + "EqualNullSafe", + "GreaterThan", + "GreaterThanOrEqual", + "LessThan", + "LessThanOrEqual", + "In", + "IsNull", + "IsNotNull", + "Not", + "StringStartsWith", + "StringEndsWith", + "StringContains", ] @@ -966,7 +978,7 @@ class DataSourceWriter(ABC): class DataSourceArrowWriter(DataSourceWriter): """ - A base class for data source writers that process data using PyArrow’s `RecordBatch`. + A base class for data source writers that process data using PyArrow's `RecordBatch`. Unlike :class:`DataSourceWriter`, which works with an iterator of Spark Rows, this class is optimized for using the Arrow format when writing data. It can offer better performance --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org