This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new ee347e0d956 [SPARK-41824][CONNECT][PYTHON] Ingore the doctest for explain of connect ee347e0d956 is described below commit ee347e0d956f49fb2a410c8e7d01185c7bd2d59d Author: Jiaan Geng <belie...@163.com> AuthorDate: Sat Jan 7 13:52:31 2023 +0900 [SPARK-41824][CONNECT][PYTHON] Ingore the doctest for explain of connect ### What changes were proposed in this pull request? Currently, the output of explain API is different between pyspark, scala and connect. There already created a dataframe with `df = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"]) ` and then execute `df.explain()` The output of pyspark show below. ``` == Physical Plan == *(1) Scan ExistingRDD[age...,name...] ``` But the scala and connect API output different content. ``` == Physical Plan == LocalTableScan [age#1148L, name#1149] <BLANKLINE> <BLANKLINE> ``` The similar issue occurs when executing `df.explain(mode="formatted")` too. It's actually implementation details in PySpark. It would be difficult to make it matched. So this PR want ignore the two doc tests. ### Why are the changes needed? Currently, the output of explain API is different between pyspark, scala and connect. This PR want ignore the two doc tests. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? Manual tests. Closes #39436 from beliefer/SPARK-41824. Authored-by: Jiaan Geng <belie...@163.com> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- python/pyspark/sql/connect/dataframe.py | 2 -- python/pyspark/sql/dataframe.py | 4 ++-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/python/pyspark/sql/connect/dataframe.py b/python/pyspark/sql/connect/dataframe.py index 17b88461a43..b04da916690 100644 --- a/python/pyspark/sql/connect/dataframe.py +++ b/python/pyspark/sql/connect/dataframe.py @@ -1535,8 +1535,6 @@ def _test() -> None: del pyspark.sql.connect.dataframe.DataFrame.drop.__doc__ del pyspark.sql.connect.dataframe.DataFrame.join.__doc__ - # TODO(SPARK-41824): DataFrame.explain format is different - del pyspark.sql.connect.dataframe.DataFrame.explain.__doc__ del pyspark.sql.connect.dataframe.DataFrame.hint.__doc__ # TODO(SPARK-41886): The doctest output has different order diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index e3646cd7d95..79809cbea53 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -639,7 +639,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): Print out the physical plan only (default). - >>> df.explain() + >>> df.explain() # doctest: +SKIP == Physical Plan == *(1) Scan ExistingRDD[age...,name...] @@ -657,7 +657,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): Print out the plans with two sections: a physical plan outline and node details - >>> df.explain(mode="formatted") + >>> df.explain(mode="formatted") # doctest: +SKIP == Physical Plan == * Scan ExistingRDD (...) (1) Scan ExistingRDD [codegen id : ...] --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org