codeant-ai-for-open-source[bot] commented on code in PR #40637:
URL: https://github.com/apache/superset/pull/40637#discussion_r3337858625
##########
superset/commands/database/uploaders/columnar_reader.py:
##########
@@ -89,6 +90,9 @@ def _yield_files(file: FileStorage) -> Generator[IO[bytes],
None, None]:
raise DatabaseUploadFailed(_("Not a valid ZIP file"))
try:
with ZipFile(file) as zip_file:
+ # guard against decompression bombs before reading entries,
+ # mirroring the importer path
+ check_is_safe_zip(zip_file)
# check if all file types are of the same extension
Review Comment:
**Suggestion:** `check_is_safe_zip()` raises `SupersetException`, but this
reader otherwise raises `DatabaseUploadFailed` for user upload errors. Letting
`SupersetException` bubble here changes upload failures into generic 500
responses instead of the expected 4xx/422-style upload error handling. Catch
and re-raise zip-safety failures as `DatabaseUploadFailed` to preserve the
existing API error contract. [api mismatch]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Columnar ZIP uploads hitting safety guard return HTTP 500.
- ⚠️ Upload UI misclassifies user ZIP issues as server errors.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Create a high-compression-ratio ZIP using `_make_high_ratio_zip()` from
`tests/unit_tests/commands/databases/columnar_reader_test.py:234-245`, which
writes ~1MB
of zeros into a ZIP entry.
2. Start Superset with this PR code and send `POST
/api/v1/database/<pk>/upload/` (upload
endpoint implemented in `superset/databases/api.py:63-80,1789-1796`) with
`type=columnar`
and the ZIP from step 1 as `file`.
3. The view constructs a `ColumnarReader`
(`superset/databases/api.py:68-73`) and calls
`UploadCommand(...).run()`, which in turn calls `BaseDataReader.read()` and
`ColumnarReader.file_to_dataframe()`
(`superset/commands/database/uploaders/columnar_reader.py:51-59`), leading to
`_yield_files()` (`line 76` hunk) when the filename ends with `.zip`.
4. Inside `_yield_files`, when `file_suffix == "zip"`,
`check_is_safe_zip(zip_file)` is
called (`superset/commands/database/uploaders/columnar_reader.py:33-37`),
which raises a
plain `SupersetException` if the compression ratio is too high
(`superset/utils/core.py:11-29`); this exception is not caught or wrapped in
`DatabaseUploadFailed` anywhere in `ColumnarReader` or `UploadCommand`, so
it bubbles up
to the global Flask error handler `show_unexpected_exception`
(`superset/views/error_handling.py:39-57`), which returns an HTTP 500
"generic backend
error", whereas other upload validation errors raise `DatabaseUploadFailed`
(`superset/commands/database/exceptions.py:17-19`, a `CommandException`) and
are rendered
as structured 4xx/422 errors via the `CommandException` handler
(`superset/views/error_handling.py:45-37`), creating inconsistent API error
semantics for
this specific ZIP-safety failure.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=4b03f8dc8dc44cae92c3893f3808b244&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=4b03f8dc8dc44cae92c3893f3808b244&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/database/uploaders/columnar_reader.py
**Line:** 94:96
**Comment:**
*Api Mismatch: `check_is_safe_zip()` raises `SupersetException`, but
this reader otherwise raises `DatabaseUploadFailed` for user upload errors.
Letting `SupersetException` bubble here changes upload failures into generic
500 responses instead of the expected 4xx/422-style upload error handling.
Catch and re-raise zip-safety failures as `DatabaseUploadFailed` to preserve
the existing API error contract.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40637&comment_hash=5800160daead4390d1bc0c256552851a8316486695713ebe92cd2213b7db68eb&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40637&comment_hash=5800160daead4390d1bc0c256552851a8316486695713ebe92cd2213b7db68eb&reaction=dislike'>👎</a>
##########
superset/daos/base.py:
##########
@@ -749,7 +750,10 @@ def list( # noqa: C901
else:
query = query.order_by(asc(column))
page = page
- page_size = max(page_size, 1)
+ # Clamp the page size to a sane range: at least 1, and no larger than
+ # the configured upper bound, to keep result sets bounded.
+ max_page_size = current_app.config.get("SQLALCHEMY_DAO_MAX_PAGE_SIZE",
1000)
+ page_size = min(max(page_size, 1), max_page_size)
Review Comment:
**Suggestion:** The new clamp trusts `SQLALCHEMY_DAO_MAX_PAGE_SIZE` without
validating it is at least 1. If that config is set to `0` or a negative value,
`page_size` becomes non-positive (`0` or negative), which breaks pagination
semantics and can even produce effectively unbounded queries on some databases.
Normalize the configured max to a positive integer before applying `min(...)`.
[incorrect condition logic]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ DAO-backed list endpoints can emit zero-row result pages.
- ⚠️ Negative max page size may cause unbounded DAO queries.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Override `SQLALCHEMY_DAO_MAX_PAGE_SIZE` in deployment config (e.g.
`superset_config.py`) to `0` or a negative value, which is read via
`current_app.config.get("SQLALCHEMY_DAO_MAX_PAGE_SIZE", 1000)` in
`BaseDAO.list`
(`superset/daos/base.py:755`); the default in core config is `1000`
(`superset/config.py:174`).
2. Call any DAO `list()` method that delegates to `BaseDAO.list` (e.g.
`UserDAO.list`
exercised in tests at
`tests/integration_tests/dao/base_dao_test.py:626-639`) with a
normal positive `page_size` argument such as `50`.
3. In `BaseDAO.list`, after building and ordering the query
(`superset/daos/base.py:720-732`), the page size is clamped using `page_size
=
min(max(page_size, 1), max_page_size)` (`superset/daos/base.py:756`); with
`max_page_size
== 0`, this computes `page_size = min(max(50, 1), 0) == 0`, or with
`max_page_size == -5`
it computes `page_size = -5`.
4. The DAO then applies `query = query.offset(page *
page_size).limit(page_size)`
(`superset/daos/base.py:757`), so a misconfigured non-positive
`SQLALCHEMY_DAO_MAX_PAGE_SIZE` causes `LIMIT 0` (empty results) or `LIMIT -5`
(driver-specific behavior, often treated as an effectively unbounded query
or an error),
breaking the intended "bounded page size" guarantee for all DAO-backed list
endpoints that
use `BaseDAO.list`.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=3718f5461364451080a9f419a42e8848&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=3718f5461364451080a9f419a42e8848&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/daos/base.py
**Line:** 755:756
**Comment:**
*Incorrect Condition Logic: The new clamp trusts
`SQLALCHEMY_DAO_MAX_PAGE_SIZE` without validating it is at least 1. If that
config is set to `0` or a negative value, `page_size` becomes non-positive (`0`
or negative), which breaks pagination semantics and can even produce
effectively unbounded queries on some databases. Normalize the configured max
to a positive integer before applying `min(...)`.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40637&comment_hash=4f9035951e671fb772869c9d55d1d98370c9e4ab9e5dbf723c8f3eacc438f8e1&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40637&comment_hash=4f9035951e671fb772869c9d55d1d98370c9e4ab9e5dbf723c8f3eacc438f8e1&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]