This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new ee347e0d956 [SPARK-41824][CONNECT][PYTHON] Ingore the doctest for 
explain of connect
ee347e0d956 is described below

commit ee347e0d956f49fb2a410c8e7d01185c7bd2d59d
Author: Jiaan Geng <belie...@163.com>
AuthorDate: Sat Jan 7 13:52:31 2023 +0900

    [SPARK-41824][CONNECT][PYTHON] Ingore the doctest for explain of connect
    
    ### What changes were proposed in this pull request?
    Currently, the output of explain API is different between pyspark, scala 
and connect.
    There already created a dataframe with
    `df = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], 
["age", "name"]) `
    and then execute
    `df.explain()`
    The output of pyspark show below.
    ```
        == Physical Plan ==
        *(1) Scan ExistingRDD[age...,name...]
    ```
    But the scala and connect API output different content.
    ```
        == Physical Plan ==
        LocalTableScan [age#1148L, name#1149]
        <BLANKLINE>
        <BLANKLINE>
    ```
    The similar issue occurs when executing `df.explain(mode="formatted")` too.
    
    It's actually implementation details in PySpark. It would be difficult to 
make it matched. So this PR want ignore the two doc tests.
    
    ### Why are the changes needed?
    Currently, the output of explain API is different between pyspark, scala 
and connect.
    This PR want ignore the two doc tests.
    
    ### Does this PR introduce _any_ user-facing change?
    'No'.
    New feature.
    
    ### How was this patch tested?
    Manual tests.
    
    Closes #39436 from beliefer/SPARK-41824.
    
    Authored-by: Jiaan Geng <belie...@163.com>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 python/pyspark/sql/connect/dataframe.py | 2 --
 python/pyspark/sql/dataframe.py         | 4 ++--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/python/pyspark/sql/connect/dataframe.py 
b/python/pyspark/sql/connect/dataframe.py
index 17b88461a43..b04da916690 100644
--- a/python/pyspark/sql/connect/dataframe.py
+++ b/python/pyspark/sql/connect/dataframe.py
@@ -1535,8 +1535,6 @@ def _test() -> None:
         del pyspark.sql.connect.dataframe.DataFrame.drop.__doc__
         del pyspark.sql.connect.dataframe.DataFrame.join.__doc__
 
-        # TODO(SPARK-41824): DataFrame.explain format is different
-        del pyspark.sql.connect.dataframe.DataFrame.explain.__doc__
         del pyspark.sql.connect.dataframe.DataFrame.hint.__doc__
 
         # TODO(SPARK-41886): The doctest output has different order
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index e3646cd7d95..79809cbea53 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -639,7 +639,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 
         Print out the physical plan only (default).
 
-        >>> df.explain()
+        >>> df.explain()  # doctest: +SKIP
         == Physical Plan ==
         *(1) Scan ExistingRDD[age...,name...]
 
@@ -657,7 +657,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 
         Print out the plans with two sections: a physical plan outline and 
node details
 
-        >>> df.explain(mode="formatted")
+        >>> df.explain(mode="formatted")  # doctest: +SKIP
         == Physical Plan ==
         * Scan ExistingRDD (...)
         (1) Scan ExistingRDD [codegen id : ...]


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to