Yicong-Huang opened a new pull request, #4702:
URL: https://github.com/apache/texera/pull/4702
### What changes were proposed in this PR?
Adds `test_dataset_file_document.py` covering `DatasetFileDocument` in
`amber/src/main/python/pytexera/storage/dataset_file_document.py`. Exercises:
- `__init__` path parsing — minimal 4-segment path, nested relative path,
leading/trailing slash stripping, and rejection of paths with fewer than 4
segments.
- Environment-variable handling — missing `USER_JWT_TOKEN`, empty
`USER_JWT_TOKEN` (falsy), default vs. explicit
`FILE_SERVICE_GET_PRESIGNED_URL_ENDPOINT`.
- `get_presigned_url` — JSON-body extraction, `Authorization: Bearer ...`
header, percent-encoding of `@` and spaces in the file path, configured
endpoint dispatch, and HTTP-failure path raising `RuntimeError`.
- `read_file` — returns `BytesIO` over downloaded content, propagates
presigned-URL failure, and raises on download failure; the second
`requests.get` call uses the URL returned by the first.
### Any related issues, documentation, discussions?
Closes #4701.
Potential bug noted while reading the module (not pinned by these tests):
`get_presigned_url` does `response.json().get("presignedUrl")`, so a 200
response that omits the `presignedUrl` field silently returns `None` instead of
raising. `read_file` then calls `requests.get(None)` and any caller that
unwraps the URL gets a less-actionable error than the explicit-status path.
Worth deciding whether to raise instead.
### How was this PR tested?
\`\`\`
cd amber/src/main/python
ruff check pytexera/storage/test_dataset_file_document.py
ruff format --check pytexera/storage/test_dataset_file_document.py
python -m pytest pytexera/storage/test_dataset_file_document.py # 17 pass
/ 0 fail
\`\`\`
### Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (claude-opus-4-7)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]