This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 6c4747cdfb02 [SPARK-46735][PYTHON][TESTS]
`pyspark.sql.tests.test_group` should skip Pandas/PyArrow tests if not available
6c4747cdfb02 is described below
commit 6c4747cdfb02b5ff7197f2e8b55a79a4ac082531
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Tue Jan 16 14:01:07 2024 -0800
[SPARK-46735][PYTHON][TESTS] `pyspark.sql.tests.test_group` should skip
Pandas/PyArrow tests if not available
### What changes were proposed in this pull request?
This PR aims to skip `Pandas`-related or `PyArrow`-related tests in
`pyspark.sql.tests.test_group` if they are not installed.
This regression was introduced by
- #44322
- #42767
### Why are the changes needed?
Since `Pandas` and `PyArrow` are optional, we need to skip the tests
instead of failures.
- https://github.com/apache/spark/actions/runs/7543495430/job/20534809039
```
======================================================================
ERROR: test_agg_func (pyspark.sql.tests.test_group.GroupTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File
"/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/pandas/utils.py", line
28, in require_minimum_pandas_version
import pandas
ModuleNotFoundError: No module named 'pandas'
```
```
======================================================================
ERROR: test_agg_func (pyspark.sql.tests.test_group.GroupTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/__w/spark/spark/python/pyspark/sql/pandas/utils.py", line 61, in
require_minimum_pyarrow_version
import pyarrow
ModuleNotFoundError: No module named 'pyarrow'
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
- Manually with the Python installation without Pandas.
```
$ python/run-tests.py --testnames pyspark.sql.tests.test_group
Running PySpark tests. Output is in
/Users/dongjoon/APACHE/spark-merge/python/unit-tests.log
Will test against the following Python executables: ['python3.9', 'pypy3']
Will test the following Python tests: ['pyspark.sql.tests.test_group']
python3.9 python_implementation is CPython
python3.9 version is: Python 3.9.18
pypy3 python_implementation is PyPy
pypy3 version is: Python 3.10.13 (f1607341da97ff5a1e93430b6e8c4af0ad1aa019,
Sep 28 2023, 20:47:55)
[PyPy 7.3.13 with GCC Apple LLVM 13.1.6 (clang-1316.0.21.2.5)]
Starting test(python3.9): pyspark.sql.tests.test_group (temp output:
/Users/dongjoon/APACHE/spark-merge/python/target/ac9269b6-f0df-4d06-88b8-e5e710202b60/python3.9__pyspark.sql.tests.test_group__9zjp5i4z.log)
Starting test(pypy3): pyspark.sql.tests.test_group (temp output:
/Users/dongjoon/APACHE/spark-merge/python/target/cab6ebed-e49f-4d86-80db-0dc3928079e3/pypy3__pyspark.sql.tests.test_group__thw6hily.log)
Finished test(pypy3): pyspark.sql.tests.test_group (6s) ... 3 tests were
skipped
Finished test(python3.9): pyspark.sql.tests.test_group (7s) ... 3 tests
were skipped
Tests passed in 7 seconds
Skipped tests in pyspark.sql.tests.test_group with pypy3:
test_agg_func (pyspark.sql.tests.test_group.GroupTests) ... skipped
'[PACKAGE_NOT_INSTALLED] Pandas >= 1.4.4 must be installed; however, it was not
found.'
test_group_by_ordinal (pyspark.sql.tests.test_group.GroupTests) ...
skipped '[PACKAGE_NOT_INSTALLED] Pandas >= 1.4.4 must be installed; however, it
was not found.'
test_order_by_ordinal (pyspark.sql.tests.test_group.GroupTests) ...
skipped '[PACKAGE_NOT_INSTALLED] Pandas >= 1.4.4 must be installed; however, it
was not found.'
Skipped tests in pyspark.sql.tests.test_group with python3.9:
test_agg_func (pyspark.sql.tests.test_group.GroupTests) ... SKIP
(0.000s)
test_group_by_ordinal (pyspark.sql.tests.test_group.GroupTests) ...
SKIP (0.000s)
test_order_by_ordinal (pyspark.sql.tests.test_group.GroupTests) ...
SKIP (0.000s)
```
- Manually with the Python installation without Pyarrow.
```
$ python/run-tests.py --testnames pyspark.sql.tests.test_group
Running PySpark tests. Output is in
/Users/dongjoon/APACHE/spark-merge/python/unit-tests.log
Will test against the following Python executables: ['python3.9', 'pypy3']
Will test the following Python tests: ['pyspark.sql.tests.test_group']
python3.9 python_implementation is CPython
python3.9 version is: Python 3.9.18
pypy3 python_implementation is PyPy
pypy3 version is: Python 3.10.13 (f1607341da97ff5a1e93430b6e8c4af0ad1aa019,
Sep 28 2023, 20:47:55)
[PyPy 7.3.13 with GCC Apple LLVM 13.1.6 (clang-1316.0.21.2.5)]
Starting test(pypy3): pyspark.sql.tests.test_group (temp output:
/Users/dongjoon/APACHE/spark-merge/python/target/7f1a665e-a679-467c-8ab4-a4532e0b2300/pypy3__pyspark.sql.tests.test_group__i67erhb4.log)
Starting test(python3.9): pyspark.sql.tests.test_group (temp output:
/Users/dongjoon/APACHE/spark-merge/python/target/47b90765-8ad7-4da0-aa7b-c12cd266847e/python3.9__pyspark.sql.tests.test_group__190hx0tm.log)
Finished test(python3.9): pyspark.sql.tests.test_group (6s) ... 3 tests
were skipped
Finished test(pypy3): pyspark.sql.tests.test_group (7s) ... 3 tests were
skipped
Tests passed in 7 seconds
Skipped tests in pyspark.sql.tests.test_group with pypy3:
test_agg_func (pyspark.sql.tests.test_group.GroupTests) ... skipped
'[PACKAGE_NOT_INSTALLED] PyArrow >= 4.0.0 must be installed; however, it was
not found.'
test_group_by_ordinal (pyspark.sql.tests.test_group.GroupTests) ...
skipped '[PACKAGE_NOT_INSTALLED] PyArrow >= 4.0.0 must be installed; however,
it was not found.'
test_order_by_ordinal (pyspark.sql.tests.test_group.GroupTests) ...
skipped '[PACKAGE_NOT_INSTALLED] PyArrow >= 4.0.0 must be installed; however,
it was not found.'
Skipped tests in pyspark.sql.tests.test_group with python3.9:
test_agg_func (pyspark.sql.tests.test_group.GroupTests) ... SKIP
(0.000s)
test_group_by_ordinal (pyspark.sql.tests.test_group.GroupTests) ...
SKIP (0.000s)
test_order_by_ordinal (pyspark.sql.tests.test_group.GroupTests) ...
SKIP (0.000s)
```
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #44759 from dongjoon-hyun/SPARK-46735.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
python/pyspark/sql/tests/test_group.py | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/python/pyspark/sql/tests/test_group.py
b/python/pyspark/sql/tests/test_group.py
index 6c84bd740171..1a9b7d9d836c 100644
--- a/python/pyspark/sql/tests/test_group.py
+++ b/python/pyspark/sql/tests/test_group.py
@@ -14,14 +14,23 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
+import unittest
from pyspark.sql import Row
from pyspark.sql import functions as sf
-from pyspark.testing.sqlutils import ReusedSQLTestCase
+from pyspark.testing.sqlutils import (
+ ReusedSQLTestCase,
+ have_pandas,
+ have_pyarrow,
+ pandas_requirement_message,
+ pyarrow_requirement_message,
+)
from pyspark.testing import assertDataFrameEqual, assertSchemaEqual
class GroupTestsMixin:
+ @unittest.skipIf(not have_pandas, pandas_requirement_message) # type:
ignore
+ @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message) # type:
ignore
def test_agg_func(self):
data = [Row(key=1, value=10), Row(key=1, value=20), Row(key=1,
value=30)]
df = self.spark.createDataFrame(data)
@@ -60,6 +69,8 @@ class GroupTestsMixin:
# test deprecated countDistinct
self.assertEqual(100,
g.agg(functions.countDistinct(df.value)).first()[0])
+ @unittest.skipIf(not have_pandas, pandas_requirement_message) # type:
ignore
+ @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message) # type:
ignore
def test_group_by_ordinal(self):
spark = self.spark
df = spark.createDataFrame(
@@ -119,6 +130,8 @@ class GroupTestsMixin:
with self.assertRaises(IndexError):
df.groupBy(10).agg(sf.sum("b"))
+ @unittest.skipIf(not have_pandas, pandas_requirement_message) # type:
ignore
+ @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message) # type:
ignore
def test_order_by_ordinal(self):
spark = self.spark
df = spark.createDataFrame(
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]