This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new f8f0edf40155 [SPARK-55576][PYTHON][TESTS] Use the latest spark version 
for test_install_spark
f8f0edf40155 is described below

commit f8f0edf40155376c3561b7f9a25b490e1c3a07ad
Author: Tian Gao <[email protected]>
AuthorDate: Wed Feb 18 15:41:31 2026 -0800

    [SPARK-55576][PYTHON][TESTS] Use the latest spark version for 
test_install_spark
    
    ### What changes were proposed in this pull request?
    
    Instead of hard-code the version to download, we search the release page 
and find the latest released version to do install test.
    
    ### Why are the changes needed?
    
    * Testing the latest version actually is more helpful for us to catch 
potential issues.
    * More importantly, the pinned version got archived when a newer micro 
version came out. The archived site is much slower than the main site which 
makes the download slow, which makes the test itself unstable.
    
    https://github.com/apache/spark/actions/runs/22058919569/job/63734082031
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Locally tested it and it passed.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #54350 from gaogaotiantian/install-latest-spark.
    
    Authored-by: Tian Gao <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 python/pyspark/tests/test_install_spark.py | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/tests/test_install_spark.py 
b/python/pyspark/tests/test_install_spark.py
index 3b4483bd84a5..b977fdf51024 100644
--- a/python/pyspark/tests/test_install_spark.py
+++ b/python/pyspark/tests/test_install_spark.py
@@ -15,10 +15,13 @@
 # limitations under the License.
 #
 import os
+import re
 import tempfile
 import unittest
+import urllib.request
 
 from pyspark.install import (
+    get_preferred_mirrors,
     install_spark,
     DEFAULT_HADOOP,
     DEFAULT_HIVE,
@@ -29,10 +32,31 @@ from pyspark.install import (
 
 
 class SparkInstallationTestCase(unittest.TestCase):
+    def get_latest_spark_version(self):
+        if "PYSPARK_RELEASE_MIRROR" in os.environ:
+            sites = [os.environ["PYSPARK_RELEASE_MIRROR"]]
+        else:
+            sites = get_preferred_mirrors()
+        # Filter out the archive sites
+        sites = [site for site in sites if "archive.apache.org" not in site]
+        for site in sites:
+            url = site + "/spark/"
+            try:
+                with urllib.request.urlopen(url) as response:
+                    html = response.read().decode("utf-8")
+                    versions = re.findall(r"spark-(\d+[\.-]\d+[\.-]\d+)/", 
html)
+                    versions = [v.replace("-", ".") for v in versions]
+                    return max(versions)
+            except Exception:
+                continue
+        return None
+
     def test_install_spark(self):
         # Test only one case. Testing this is expensive because it needs to 
download
-        # the Spark distribution, ensure it is available at 
https://dlcdn.apache.org/spark/
-        spark_version, hadoop_version, hive_version = 
checked_versions("3.5.7", "3", "2.3")
+        # the Spark distribution. We try to get the latest version, but if we 
can't,
+        # we just use a hard-coded version.
+        spark_version = self.get_latest_spark_version() or "4.1.1"
+        spark_version, hadoop_version, hive_version = 
checked_versions(spark_version, "3", "2.3")
 
         with tempfile.TemporaryDirectory(prefix="test_install_spark") as 
tmp_dir:
             install_spark(


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to