codeant-ai-for-open-source[bot] commented on code in PR #40637:
URL: https://github.com/apache/superset/pull/40637#discussion_r3337858625


##########
superset/commands/database/uploaders/columnar_reader.py:
##########
@@ -89,6 +90,9 @@ def _yield_files(file: FileStorage) -> Generator[IO[bytes], 
None, None]:
                 raise DatabaseUploadFailed(_("Not a valid ZIP file"))
             try:
                 with ZipFile(file) as zip_file:
+                    # guard against decompression bombs before reading entries,
+                    # mirroring the importer path
+                    check_is_safe_zip(zip_file)
                     # check if all file types are of the same extension

Review Comment:
   **Suggestion:** `check_is_safe_zip()` raises `SupersetException`, but this 
reader otherwise raises `DatabaseUploadFailed` for user upload errors. Letting 
`SupersetException` bubble here changes upload failures into generic 500 
responses instead of the expected 4xx/422-style upload error handling. Catch 
and re-raise zip-safety failures as `DatabaseUploadFailed` to preserve the 
existing API error contract. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Columnar ZIP uploads hitting safety guard return HTTP 500.
   - ⚠️ Upload UI misclassifies user ZIP issues as server errors.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Create a high-compression-ratio ZIP using `_make_high_ratio_zip()` from
   `tests/unit_tests/commands/databases/columnar_reader_test.py:234-245`, which 
writes ~1MB
   of zeros into a ZIP entry.
   
   2. Start Superset with this PR code and send `POST 
/api/v1/database/<pk>/upload/` (upload
   endpoint implemented in `superset/databases/api.py:63-80,1789-1796`) with 
`type=columnar`
   and the ZIP from step 1 as `file`.
   
   3. The view constructs a `ColumnarReader` 
(`superset/databases/api.py:68-73`) and calls
   `UploadCommand(...).run()`, which in turn calls `BaseDataReader.read()` and
   `ColumnarReader.file_to_dataframe()`
   (`superset/commands/database/uploaders/columnar_reader.py:51-59`), leading to
   `_yield_files()` (`line 76` hunk) when the filename ends with `.zip`.
   
   4. Inside `_yield_files`, when `file_suffix == "zip"`, 
`check_is_safe_zip(zip_file)` is
   called (`superset/commands/database/uploaders/columnar_reader.py:33-37`), 
which raises a
   plain `SupersetException` if the compression ratio is too high
   (`superset/utils/core.py:11-29`); this exception is not caught or wrapped in
   `DatabaseUploadFailed` anywhere in `ColumnarReader` or `UploadCommand`, so 
it bubbles up
   to the global Flask error handler `show_unexpected_exception`
   (`superset/views/error_handling.py:39-57`), which returns an HTTP 500 
"generic backend
   error", whereas other upload validation errors raise `DatabaseUploadFailed`
   (`superset/commands/database/exceptions.py:17-19`, a `CommandException`) and 
are rendered
   as structured 4xx/422 errors via the `CommandException` handler
   (`superset/views/error_handling.py:45-37`), creating inconsistent API error 
semantics for
   this specific ZIP-safety failure.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=4b03f8dc8dc44cae92c3893f3808b244&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=4b03f8dc8dc44cae92c3893f3808b244&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/database/uploaders/columnar_reader.py
   **Line:** 94:96
   **Comment:**
        *Api Mismatch: `check_is_safe_zip()` raises `SupersetException`, but 
this reader otherwise raises `DatabaseUploadFailed` for user upload errors. 
Letting `SupersetException` bubble here changes upload failures into generic 
500 responses instead of the expected 4xx/422-style upload error handling. 
Catch and re-raise zip-safety failures as `DatabaseUploadFailed` to preserve 
the existing API error contract.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40637&comment_hash=5800160daead4390d1bc0c256552851a8316486695713ebe92cd2213b7db68eb&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40637&comment_hash=5800160daead4390d1bc0c256552851a8316486695713ebe92cd2213b7db68eb&reaction=dislike'>👎</a>



##########
superset/daos/base.py:
##########
@@ -749,7 +750,10 @@ def list(  # noqa: C901
             else:
                 query = query.order_by(asc(column))
         page = page
-        page_size = max(page_size, 1)
+        # Clamp the page size to a sane range: at least 1, and no larger than
+        # the configured upper bound, to keep result sets bounded.
+        max_page_size = current_app.config.get("SQLALCHEMY_DAO_MAX_PAGE_SIZE", 
1000)
+        page_size = min(max(page_size, 1), max_page_size)

Review Comment:
   **Suggestion:** The new clamp trusts `SQLALCHEMY_DAO_MAX_PAGE_SIZE` without 
validating it is at least 1. If that config is set to `0` or a negative value, 
`page_size` becomes non-positive (`0` or negative), which breaks pagination 
semantics and can even produce effectively unbounded queries on some databases. 
Normalize the configured max to a positive integer before applying `min(...)`. 
[incorrect condition logic]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ DAO-backed list endpoints can emit zero-row result pages.
   - ⚠️ Negative max page size may cause unbounded DAO queries.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Override `SQLALCHEMY_DAO_MAX_PAGE_SIZE` in deployment config (e.g.
   `superset_config.py`) to `0` or a negative value, which is read via
   `current_app.config.get("SQLALCHEMY_DAO_MAX_PAGE_SIZE", 1000)` in 
`BaseDAO.list`
   (`superset/daos/base.py:755`); the default in core config is `1000`
   (`superset/config.py:174`).
   
   2. Call any DAO `list()` method that delegates to `BaseDAO.list` (e.g. 
`UserDAO.list`
   exercised in tests at 
`tests/integration_tests/dao/base_dao_test.py:626-639`) with a
   normal positive `page_size` argument such as `50`.
   
   3. In `BaseDAO.list`, after building and ordering the query
   (`superset/daos/base.py:720-732`), the page size is clamped using `page_size 
=
   min(max(page_size, 1), max_page_size)` (`superset/daos/base.py:756`); with 
`max_page_size
   == 0`, this computes `page_size = min(max(50, 1), 0) == 0`, or with 
`max_page_size == -5`
   it computes `page_size = -5`.
   
   4. The DAO then applies `query = query.offset(page * 
page_size).limit(page_size)`
   (`superset/daos/base.py:757`), so a misconfigured non-positive
   `SQLALCHEMY_DAO_MAX_PAGE_SIZE` causes `LIMIT 0` (empty results) or `LIMIT -5`
   (driver-specific behavior, often treated as an effectively unbounded query 
or an error),
   breaking the intended "bounded page size" guarantee for all DAO-backed list 
endpoints that
   use `BaseDAO.list`.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=3718f5461364451080a9f419a42e8848&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=3718f5461364451080a9f419a42e8848&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/daos/base.py
   **Line:** 755:756
   **Comment:**
        *Incorrect Condition Logic: The new clamp trusts 
`SQLALCHEMY_DAO_MAX_PAGE_SIZE` without validating it is at least 1. If that 
config is set to `0` or a negative value, `page_size` becomes non-positive (`0` 
or negative), which breaks pagination semantics and can even produce 
effectively unbounded queries on some databases. Normalize the configured max 
to a positive integer before applying `min(...)`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40637&comment_hash=4f9035951e671fb772869c9d55d1d98370c9e4ab9e5dbf723c8f3eacc438f8e1&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40637&comment_hash=4f9035951e671fb772869c9d55d1d98370c9e4ab9e5dbf723c8f3eacc438f8e1&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to