[ 
https://issues.apache.org/jira/browse/SPARK-27387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-27387.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Fixed in https://github.com/apache/spark/pull/24306

> Replace sqlutils assertPandasEqual with Pandas assert_frame_equal in tests
> --------------------------------------------------------------------------
>
>                 Key: SPARK-27387
>                 URL: https://issues.apache.org/jira/browse/SPARK-27387
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Tests
>    Affects Versions: 2.4.1
>            Reporter: Bryan Cutler
>            Priority: Major
>             Fix For: 3.0.0
>
>
> In PySpark unit tests, sqlutils ReusedSQLTestCase.assertPandasEqual is meant 
> to check if 2 pandas.DataFrames are equal but it seems for later versions of 
> Pandas, this can fail if the DataFrame has an array column. This method can 
> be replaced by {{assert_frame_equal}} from pandas.util.testing.  This is what 
> it is meant for and it will give a better assertion message as well.
> The test failure I have seen is:
>  {noformat}
> ======================================================================
> ERROR: test_supported_types 
> (pyspark.sql.tests.test_pandas_udf_grouped_map.GroupedMapPandasUDFTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File 
> "/home/bryan/git/spark/python/pyspark/sql/tests/test_pandas_udf_grouped_map.py",
>  line 128, in test_supported_types
>     self.assertPandasEqual(expected1, result1)
>   File "/home/bryan/git/spark/python/pyspark/testing/sqlutils.py", line 268, 
> in assertPandasEqual
>     self.assertTrue(expected.equals(result), msg=msg)
>   File "/home/bryan/miniconda2/envs/pa012/lib/python3.6/site-packages/pandas
> ...
>   File "pandas/_libs/lib.pyx", line 523, in 
> pandas._libs.lib.array_equivalent_object
> ValueError: The truth value of an array with more than one element is 
> ambiguous. Use a.any() or a.all()
>  {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to