This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new f8f0edf40155 [SPARK-55576][PYTHON][TESTS] Use the latest spark version
for test_install_spark
f8f0edf40155 is described below
commit f8f0edf40155376c3561b7f9a25b490e1c3a07ad
Author: Tian Gao <[email protected]>
AuthorDate: Wed Feb 18 15:41:31 2026 -0800
[SPARK-55576][PYTHON][TESTS] Use the latest spark version for
test_install_spark
### What changes were proposed in this pull request?
Instead of hard-code the version to download, we search the release page
and find the latest released version to do install test.
### Why are the changes needed?
* Testing the latest version actually is more helpful for us to catch
potential issues.
* More importantly, the pinned version got archived when a newer micro
version came out. The archived site is much slower than the main site which
makes the download slow, which makes the test itself unstable.
https://github.com/apache/spark/actions/runs/22058919569/job/63734082031
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Locally tested it and it passed.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #54350 from gaogaotiantian/install-latest-spark.
Authored-by: Tian Gao <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
python/pyspark/tests/test_install_spark.py | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
diff --git a/python/pyspark/tests/test_install_spark.py
b/python/pyspark/tests/test_install_spark.py
index 3b4483bd84a5..b977fdf51024 100644
--- a/python/pyspark/tests/test_install_spark.py
+++ b/python/pyspark/tests/test_install_spark.py
@@ -15,10 +15,13 @@
# limitations under the License.
#
import os
+import re
import tempfile
import unittest
+import urllib.request
from pyspark.install import (
+ get_preferred_mirrors,
install_spark,
DEFAULT_HADOOP,
DEFAULT_HIVE,
@@ -29,10 +32,31 @@ from pyspark.install import (
class SparkInstallationTestCase(unittest.TestCase):
+ def get_latest_spark_version(self):
+ if "PYSPARK_RELEASE_MIRROR" in os.environ:
+ sites = [os.environ["PYSPARK_RELEASE_MIRROR"]]
+ else:
+ sites = get_preferred_mirrors()
+ # Filter out the archive sites
+ sites = [site for site in sites if "archive.apache.org" not in site]
+ for site in sites:
+ url = site + "/spark/"
+ try:
+ with urllib.request.urlopen(url) as response:
+ html = response.read().decode("utf-8")
+ versions = re.findall(r"spark-(\d+[\.-]\d+[\.-]\d+)/",
html)
+ versions = [v.replace("-", ".") for v in versions]
+ return max(versions)
+ except Exception:
+ continue
+ return None
+
def test_install_spark(self):
# Test only one case. Testing this is expensive because it needs to
download
- # the Spark distribution, ensure it is available at
https://dlcdn.apache.org/spark/
- spark_version, hadoop_version, hive_version =
checked_versions("3.5.7", "3", "2.3")
+ # the Spark distribution. We try to get the latest version, but if we
can't,
+ # we just use a hard-coded version.
+ spark_version = self.get_latest_spark_version() or "4.1.1"
+ spark_version, hadoop_version, hive_version =
checked_versions(spark_version, "3", "2.3")
with tempfile.TemporaryDirectory(prefix="test_install_spark") as
tmp_dir:
install_spark(
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]