This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.0 by this push:
new 3b5b533 [SPARK-33190][INFRA][TESTS] Set upper bound of PyArrow
version in GitHub Actions
3b5b533 is described below
commit 3b5b5334cc91ef146642ef2bec286bd63224b1f8
Author: HyukjinKwon <[email protected]>
AuthorDate: Tue Oct 20 17:35:09 2020 +0900
[SPARK-33190][INFRA][TESTS] Set upper bound of PyArrow version in GitHub
Actions
PyArrow is uploaded into PyPI today (https://pypi.org/project/pyarrow/),
and some tests fail with PyArrow 2.0.0+:
```
======================================================================
ERROR [0.774s]: test_grouped_over_window_with_key
(pyspark.sql.tests.test_pandas_grouped_map.GroupedMapInPandasTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File
"/__w/spark/spark/python/pyspark/sql/tests/test_pandas_grouped_map.py", line
595, in test_grouped_over_window_with_key
.select('id', 'result').collect()
File "/__w/spark/spark/python/pyspark/sql/dataframe.py", line 588, in
collect
sock_info = self._jdf.collectToPython()
File
"/__w/spark/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line
1305, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/__w/spark/spark/python/pyspark/sql/utils.py", line 117, in deco
raise converted from None
pyspark.sql.utils.PythonException:
An exception was thrown from the Python worker. Please see the stack
trace below.
Traceback (most recent call last):
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line
601, in main
process()
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line
593, in process
serializer.dump_stream(out_iter, outfile)
File
"/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
line 255, in dump_stream
return ArrowStreamSerializer.dump_stream(self,
init_stream_yield_batches(), stream)
File
"/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
line 81, in dump_stream
for batch in iterator:
File
"/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
line 248, in init_stream_yield_batches
for series in iterator:
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line
426, in mapper
return f(keys, vals)
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line
170, in <lambda>
return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))]
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line
158, in wrapped
result = f(key, pd.concat(value_series, axis=1))
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 68,
in wrapper
return f(*args, **kwargs)
File
"/__w/spark/spark/python/pyspark/sql/tests/test_pandas_grouped_map.py", line
590, in f
"{} != {}".format(expected_key[i][1], window_range)
AssertionError: {'start': datetime.datetime(2018, 3, 15, 0, 0), 'end':
datetime.datetime(2018, 3, 20, 0, 0)} != {'start': datetime.datetime(2018, 3,
15, 0, 0, tzinfo=<StaticTzInfo 'Etc/UTC'>), 'end': datetime.datetime(2018, 3,
20, 0, 0, tzinfo=<StaticTzInfo 'Etc/UTC'>)}
```
https://github.com/apache/spark/runs/1278917457
This PR proposes to set the upper bound of PyArrow in GitHub Actions build.
This should be removed when we properly support PyArrow 2.0.0+ (SPARK-33189).
To make build pass.
No, dev-only.
GitHub Actions in this build will test it out.
Closes #30098 from HyukjinKwon/hot-fix-test.
Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit eb9966b70055a67dd02451c78ec205d913a38a42)
Signed-off-by: HyukjinKwon <[email protected]>
---
.github/workflows/build_and_test.yml | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/.github/workflows/build_and_test.yml
b/.github/workflows/build_and_test.yml
index acef583..649ce95 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -149,7 +149,7 @@ jobs:
# PyArrow is not supported in PyPy yet, see ARROW-2651.
# TODO(SPARK-32247): scipy installation with PyPy fails for an unknown
reason.
run: |
- python2.7 -m pip install numpy pyarrow pandas scipy xmlrunner
+ python2.7 -m pip install numpy 'pyarrow<2.0.0' pandas scipy xmlrunner
python2.7 -m pip list
# PyPy does not have xmlrunner
pypy3 -m pip install numpy pandas
@@ -157,7 +157,7 @@ jobs:
- name: Install Python packages (Python 3.8)
if: contains(matrix.modules, 'pyspark') || (contains(matrix.modules,
'sql') && !contains(matrix.modules, 'sql-'))
run: |
- python3.8 -m pip install numpy pyarrow pandas scipy xmlrunner
+ python3.8 -m pip install numpy 'pyarrow<2.0.0' pandas scipy xmlrunner
python3.8 -m pip list
# SparkR
- name: Install R 4.0
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]