Copilot commented on code in PR #2143:
URL: https://github.com/apache/sedona/pull/2143#discussion_r2223940238
##########
python/tests/stac/test_collection_client.py:
##########
@@ -17,6 +17,8 @@
from sedona.spark.stac.client import Client
from sedona.spark.stac.collection_client import CollectionClient
+from pyspark.sql import DataFrame
+import collections.abc
Review Comment:
The import statement should follow PEP 8 ordering conventions. Standard
library imports like 'collections.abc' should be placed before third-party
imports like 'pyspark.sql'.
```suggestion
import collections.abc
from pyspark.sql import DataFrame
from sedona.spark.stac.client import Client
from sedona.spark.stac.collection_client import CollectionClient
```
##########
python/tests/stac/test_client.py:
##########
@@ -16,6 +16,7 @@
# under the License.
from sedona.spark.stac.client import Client
+import collections.abc
Review Comment:
The import statement should follow PEP 8 ordering conventions. Standard
library imports like 'collections.abc' should be placed before third-party
imports.
```suggestion
import collections.abc
from sedona.spark.stac.client import Client
```
##########
python/sedona/spark/stac/collection_client.py:
##########
@@ -30,27 +30,31 @@ def get_collection_url(url: str, collection_id:
Optional[str] = None) -> str:
"""
Constructs the collection URL based on the provided base URL and optional
collection ID.
- If the collection ID is provided and the URL starts with 'http' or
'https', the collection ID
- is appended to the URL. Otherwise, an exception is raised.
-
Parameters:
- - url (str): The base URL of the STAC collection.
- - collection_id (Optional[str]): The optional collection ID to append to
the URL.
+ - url (str): The base URL of the STAC collection (http, https, or local
file path)
+ - collection_id (Optional[str]): The optional collection ID to append to
the URL
Returns:
- str: The constructed collection URL.
- Raises:
- - ValueError: If the URL does not start with 'http' or 'https' and a
collection ID is provided.
+ Examples:
+ - "https://stac.example.com" + "landsat" ->
"https://stac.example.com/collections/landsat"
+ - "http://localhost:8000" + "sentinel2" ->
"http://localhost:8000/collections/sentinel2"
+ - "/tmp/test.json" + "landsat" -> "/tmp/test.json/collections/landsat"
+ - "file:///tmp/test.json" + "landsat" ->
"file:///tmp/test.json/collections/landsat"
"""
if not collection_id:
return url
- elif url.startswith("http") or url.startswith("https"):
+
+ # Support web URLs (http/https)
+ if url.startswith("http") or url.startswith("https"):
+ return f"{url}/collections/{collection_id}"
+ # Support local file paths (absolute or relative) and file:// URLs
+ elif url.startswith("/") or url.startswith("file://"):
return f"{url}/collections/{collection_id}"
else:
- raise ValueError(
- "Collection ID is not used because the URL does not start with
http or https"
- )
+ # For testing, allow any string to be treated as a valid URL base
Review Comment:
The catch-all else clause that treats any string as a valid URL base
introduces unpredictable behavior. Consider adding explicit validation or at
least logging a warning for unrecognized URL schemes.
```suggestion
# Validate the URL scheme and log a warning for unrecognized schemes
from urllib.parse import urlparse
parsed_url = urlparse(url)
if parsed_url.scheme not in {"http", "https", "file"}:
logging.warning(f"Unrecognized URL scheme '{parsed_url.scheme}'
in URL: {url}")
return url # Return the original URL without appending the
collection ID
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]