Re: [PR] [GH-2142] Fix STAC python unit tests for remote endpoint [sedona]

via GitHub Tue, 22 Jul 2025 15:25:39 -0700


Copilot commented on code in PR #2143:
URL: https://github.com/apache/sedona/pull/2143#discussion_r2223940238



##########
python/tests/stac/test_collection_client.py:
##########
@@ -17,6 +17,8 @@
 
 from sedona.spark.stac.client import Client
 from sedona.spark.stac.collection_client import CollectionClient
+from pyspark.sql import DataFrame
+import collections.abc

Review Comment:
   The import statement should follow PEP 8 ordering conventions. Standard 
library imports like 'collections.abc' should be placed before third-party 
imports like 'pyspark.sql'.
   ```suggestion
   import collections.abc
   
   from pyspark.sql import DataFrame
   from sedona.spark.stac.client import Client
   from sedona.spark.stac.collection_client import CollectionClient
   ```



##########
python/tests/stac/test_client.py:
##########
@@ -16,6 +16,7 @@
 # under the License.
 
 from sedona.spark.stac.client import Client
+import collections.abc

Review Comment:
   The import statement should follow PEP 8 ordering conventions. Standard 
library imports like 'collections.abc' should be placed before third-party 
imports.
   ```suggestion
   import collections.abc
   from sedona.spark.stac.client import Client
   ```



##########
python/sedona/spark/stac/collection_client.py:
##########
@@ -30,27 +30,31 @@ def get_collection_url(url: str, collection_id: 
Optional[str] = None) -> str:
     """
     Constructs the collection URL based on the provided base URL and optional 
collection ID.
 
-    If the collection ID is provided and the URL starts with 'http' or 
'https', the collection ID
-    is appended to the URL. Otherwise, an exception is raised.
-
     Parameters:
-    - url (str): The base URL of the STAC collection.
-    - collection_id (Optional[str]): The optional collection ID to append to 
the URL.
+    - url (str): The base URL of the STAC collection (http, https, or local 
file path)
+    - collection_id (Optional[str]): The optional collection ID to append to 
the URL
 
     Returns:
     - str: The constructed collection URL.
 
-    Raises:
-    - ValueError: If the URL does not start with 'http' or 'https' and a 
collection ID is provided.
+    Examples:
+    - "https://stac.example.com"; + "landsat" -> 
"https://stac.example.com/collections/landsat";
+    - "http://localhost:8000"; + "sentinel2" -> 
"http://localhost:8000/collections/sentinel2";
+    - "/tmp/test.json" + "landsat" -> "/tmp/test.json/collections/landsat"
+    - "file:///tmp/test.json" + "landsat" -> 
"file:///tmp/test.json/collections/landsat"
     """
     if not collection_id:
         return url
-    elif url.startswith("http") or url.startswith("https"):
+
+    # Support web URLs (http/https)
+    if url.startswith("http") or url.startswith("https"):
+        return f"{url}/collections/{collection_id}"
+    # Support local file paths (absolute or relative) and file:// URLs
+    elif url.startswith("/") or url.startswith("file://"):
         return f"{url}/collections/{collection_id}"
     else:
-        raise ValueError(
-            "Collection ID is not used because the URL does not start with 
http or https"
-        )
+        # For testing, allow any string to be treated as a valid URL base

Review Comment:
   The catch-all else clause that treats any string as a valid URL base 
introduces unpredictable behavior. Consider adding explicit validation or at 
least logging a warning for unrecognized URL schemes.
   ```suggestion
           # Validate the URL scheme and log a warning for unrecognized schemes
           from urllib.parse import urlparse
           parsed_url = urlparse(url)
           if parsed_url.scheme not in {"http", "https", "file"}:
               logging.warning(f"Unrecognized URL scheme '{parsed_url.scheme}' 
in URL: {url}")
               return url  # Return the original URL without appending the 
collection ID
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [GH-2142] Fix STAC python unit tests for remote endpoint [sedona]

Reply via email to