rich7420 commented on code in PR #1174:
URL: https://github.com/apache/mahout/pull/1174#discussion_r2923245221
##########
qdp/qdp-python/qumat_qdp/loader.py:
##########
@@ -166,10 +166,15 @@ def source_file(self, path: str, streaming: bool = False)
-> QuantumDataLoader:
For streaming=True (Phase 2b), only .parquet is supported; data is
read in chunks to reduce memory.
For streaming=False, supports .parquet, .arrow, .feather, .ipc, .npy,
.pt, .pth, .pb.
- Remote paths (s3://) are supported when the remote-io feature is
enabled.
+ Remote paths (s3://, gs://) are supported when the remote-io feature
is enabled.
+ Remote URL query/fragment (for example ?versionId=... or #...) is not
supported.
"""
if not path or not isinstance(path, str):
raise ValueError(f"path must be a non-empty string, got {path!r}")
+ if "://" in path and ("?" in path or "#" in path):
+ raise ValueError(
+ "Remote URL query/fragment is not supported; use plain
scheme://bucket/key path."
+ )
# For remote URLs, extract the key portion for extension checks.
check_path = path.split("?")[0].rsplit("/", 1)[-1] if "://" in path
else path
Review Comment:
nit: Since query/fragment is already rejected on L174, the
`path.split("?")[0]` here is effectively dead code for remote URLs. Could
simplify to just `path.rsplit("/", 1)[-1]`, or drop a quick comment noting it's
kept as a defensive fallback. Not a big deal either way
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]