Oh, whoops, didn’t realize that wasn’t the release version, thanks! > git clone --branch branch-3.2 https://github.com/apache/spark.git
Ah, so the old failing tests are passing now, but I am seeing failures in `pyspark.tests.test_broadcast` such as `test_broadcast_value_against_gc`, with a majority of them failing due to `ConnectionRefusedError: [Errno 61] Connection refused`. Maybe these tests are not mean to be ran locally, and only in the pipeline? Also, I see this warning that mentions to notify the maintainers here: ``` Starting test(/usr/local/bin/python3): pyspark.tests.test_broadcast WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/$path/spark/common/unsafe/target/scala-2.12/classes/) to constructor java.nio.DirectByteBuffer(long,int) ``` FWIW, not sure if this matters, but python executable used for running these tests is `Python 3.10.9` under `/user/local/bin/python3`. Best, Adam Chhina > On Jan 18, 2023, at 3:05 PM, Bjørn Jørgensen <bjornjorgen...@gmail.com> wrote: > > Replace > > > git clone g...@github.com:apache/spark.git > > > git checkout -b spark-321 v3.2.1 > > with > git clone --branch branch-3.2 https://github.com/apache/spark.git > This will give you branch 3.2 as today, what I suppose you call upstream > https://github.com/apache/spark/commits/branch-3.2 > and right now all tests in github action are passed :) > > > ons. 18. jan. 2023 kl. 18:07 skrev Sean Owen <sro...@gmail.com > <mailto:sro...@gmail.com>>: >> Never seen those, but it's probably a difference in pandas, numpy versions. >> You can see the current CICD test results in GitHub Actions. But, you want >> to use release versions, not an RC. 3.2.1 is not the latest version, and >> it's possible the tests were actually failing in the RC. >> >> On Wed, Jan 18, 2023, 10:57 AM Adam Chhina <amanschh...@gmail.com >> <mailto:amanschh...@gmail.com>> wrote: >>> Bump, >>> >>> Just trying to see where I can find what tests are known failing for a >>> particular release, to ensure I’m building upstream correctly following the >>> build docs. I figured this would be the best place to ask as it pertains to >>> building and testing upstream (also more than happy to provide a PR for any >>> docs if required afterwards), however if there would be a more appropriate >>> place, please let me know. >>> >>> Best, >>> >>> Adam Chhina >>> >>> > On Dec 27, 2022, at 11:37 AM, Adam Chhina <amanschh...@gmail.com >>> > <mailto:amanschh...@gmail.com>> wrote: >>> > >>> > As part of an upgrade I was looking to run upstream PySpark unit tests on >>> > `v3.2.1-rc2` before I applied some downstream patches and tested those. >>> > However, I'm running into some issues with failing unit tests, which I'm >>> > not sure are failing upstream or due to some step I missed in the build. >>> > >>> > The current failing tests (at least so far, since I believe the python >>> > script exits on test failure): >>> > ``` >>> > ====================================================================== >>> > FAIL: test_train_prediction >>> > (pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests) >>> > Test that error on test data improves as model is trained. >>> > ---------------------------------------------------------------------- >>> > Traceback (most recent call last): >>> > File >>> > "/Users/adam/OSS/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", >>> > line 474, in test_train_prediction >>> > eventually(condition, timeout=180.0) >>> > File "/Users/adam/OSS/spark/python/pyspark/testing/utils.py", line 86, >>> > in eventually >>> > lastValue = condition() >>> > File >>> > "/Users/adam/OSS/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", >>> > line 469, in condition >>> > self.assertGreater(errors[1] - errors[-1], 2) >>> > AssertionError: 1.8960983527735014 not greater than 2 >>> > >>> > ====================================================================== >>> > FAIL: test_parameter_accuracy >>> > (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests) >>> > Test that the final value of weights is close to the desired value. >>> > ---------------------------------------------------------------------- >>> > Traceback (most recent call last): >>> > File >>> > "/Users/adam/OSS/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", >>> > line 229, in test_parameter_accuracy >>> > eventually(condition, timeout=60.0, catch_assertions=True) >>> > File "/Users/adam/OSS/spark/python/pyspark/testing/utils.py", line 91, >>> > in eventually >>> > raise lastValue >>> > File "/Users/adam/OSS/spark/python/pyspark/testing/utils.py", line 82, >>> > in eventually >>> > lastValue = condition() >>> > File >>> > "/Users/adam/OSS/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", >>> > line 226, in condition >>> > self.assertAlmostEqual(rel, 0.1, 1) >>> > AssertionError: 0.23052813480829393 != 0.1 within 1 places >>> > (0.13052813480829392 difference) >>> > >>> > ====================================================================== >>> > FAIL: test_training_and_prediction >>> > (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests) >>> > Test that the model improves on toy data with no. of batches >>> > ---------------------------------------------------------------------- >>> > Traceback (most recent call last): >>> > File >>> > "/Users/adam/OSS/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", >>> > line 334, in test_training_and_prediction >>> > eventually(condition, timeout=180.0) >>> > File "/Users/adam/OSS/spark/python/pyspark/testing/utils.py", line 93, >>> > in eventually >>> > raise AssertionError( >>> > AssertionError: Test failed due to timeout after 180 sec, with last >>> > condition returning: Latest errors: 0.67, 0.71, 0.78, 0.7, 0.75, 0.74, >>> > 0.73, 0.69, 0.62, 0.71, 0.69, 0.75, 0.72, 0.77, 0.71, 0.74, 0.76, 0.78, >>> > 0.7, 0.78, 0.8, 0.74, 0.77, 0.75, 0.76, 0.76, 0.75, 0.78, 0.74, 0.64, >>> > 0.64, 0.71, 0.78, 0.76, 0.64, 0.68, 0.69, 0.72, 0.77 >>> > >>> > ---------------------------------------------------------------------- >>> > Ran 13 tests in 661.536s >>> > >>> > FAILED (failures=3, skipped=1) >>> > >>> > Had test failures in pyspark.mllib.tests.test_streaming_algorithms with >>> > /usr/local/bin/python3; see logs. >>> > ``` >>> > >>> > Here's how I'm currently building Spark, I was using the >>> > [building-spark](https://spark.apache.org/docs/3..1/building-spark.html) >>> > docs as a reference. >>> > ``` >>> > > git clone g...@github.com:apache/spark.git >>> > > git checkout -b spark-321 v3.2.1 >>> > > ./build/mvn -DskipTests clean package -Phive >>> > > export JAVA_HOME=$(path/to/jdk/11) >>> > > ./python/run-tests >>> > ``` >>> > >>> > Current Java version >>> > ``` >>> > java -version >>> > openjdk version "11.0.17" 2022-10-18 >>> > OpenJDK Runtime Environment Homebrew (build 11.0.17+0) >>> > OpenJDK 64-Bit Server VM Homebrew (build 11.0.17+0, mixed mode) >>> > ``` >>> > >>> > Alternatively, I've also tried simply building Spark and using a >>> > python=3.9 venv and installing the requirements from `pip install -r >>> > dev/requirements.txt` and using that as the interpreter to run tests. >>> > However, I was running into some failing pandas test which to me seemed >>> > like it was coming from a pandas version difference as `requirements.txt` >>> > didn't specify a version. >>> > >>> > I suppose I have a couple of questions in regards to this: >>> > 1. Am I missing a build step to build Spark and run PySpark unit tests? >>> > 2. Where could I find whether an upstream test is failing for a specific >>> > release? >>> > 3. Would it be possible to configure the `run-tests` script to run all >>> > tests regardless of test failures? >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> <mailto:dev-unsubscr...@spark.apache.org> >>> > > > -- > Bjørn Jørgensen > Vestre Aspehaug 4, 6010 Ålesund > Norge > > +47 480 94 297