codeant-ai-for-open-source[bot] commented on code in PR #40860:
URL: https://github.com/apache/superset/pull/40860#discussion_r3375063101


##########
superset/commands/database/uploaders/base.py:
##########
@@ -196,3 +208,11 @@ def validate(self) -> None:
             raise DatabaseSchemaUploadNotAllowed()
         if not self._model.db_engine_spec.supports_file_upload:
             raise DatabaseUploadNotSupported()
+
+        max_file_size = current_app.config.get("UPLOAD_MAX_FILE_SIZE_BYTES")
+        if (
+            max_file_size is not None
+            and self._file is not None
+            and self._file_size_bytes(self._file) > max_file_size
+        ):
+            raise DatabaseUploadFileTooLarge()

Review Comment:
   **Suggestion:** The new size limit is enforced only in 
`UploadCommand.validate()`, but the metadata endpoint 
(`/<pk>/upload/form_data`) does not use `UploadCommand` and still calls reader 
`file_metadata()` directly. That leaves a bypass where oversized files can 
still be parsed (including memory-heavy paths like zipped columnar metadata), 
so the intended protection against large uploads is incomplete. Apply the same 
`UPLOAD_MAX_FILE_SIZE_BYTES` check in the metadata flow (or centralize it in 
shared pre-validation used by both upload and metadata endpoints). [security]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Metadata endpoint parses oversized files despite configured size limit.
   - ⚠️ Large zipped columnar metadata can exhaust worker memory.
   - ⚠️ Inconsistent behavior between upload and metadata endpoints confusing 
users.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. In configuration (e.g. `superset/config.py`), set 
`UPLOAD_MAX_FILE_SIZE_BYTES` to a
   small limit (for example 1 * 1024 * 1024 for 1 MB) so that the new file-size 
check in
   `UploadCommand.validate()` at 
`superset/commands/database/uploaders/base.py:44-59` will
   trigger for files larger than this size.
   
   2. Start Superset and issue a POST request to the database upload endpoint
   `/api/v1/database/<pk>/upload/` (implemented as `upload()` in
   `superset/databases/api.py:8-75`), sending a multipart form with a file 
larger than the
   configured limit and matching type (CSV/EXCEL/COLUMNAR). The request is 
parsed into
   `UploadPostSchema`, a reader (e.g. `CSVReader`) is constructed, and
   `UploadCommand(...).run()` is invoked, which calls 
`UploadCommand.validate()` and raises
   `DatabaseUploadFileTooLarge` when `self._file_size_bytes(self._file) > 
max_file_size`,
   causing the API to return a 413-like error for the upload path.
   
   3. Now send a POST request to the metadata endpoint 
`/api/v1/database/upload_metadata/`
   (implemented as `upload_metadata()` in 
`superset/databases/api.py:12-36,52-60`) with the
   same oversized file in `request.files["file"]` and a matching `type` in the 
form body.
   This endpoint parses the form with `UploadFileMetadataPostSchema` and then 
directly calls
   `CSVReader(parameters).file_metadata(parameters["file"])`,
   `ExcelReader(parameters).file_metadata(parameters["file"])`, or
   `ColumnarReader(parameters).file_metadata(parameters["file"])` without 
passing through
   `UploadCommand.validate()` or any size check.
   
   4. Observe that for this oversized file the metadata path still parses the 
file: for CSV,
   `CSVReader.file_metadata()` at 
`superset/commands/database/uploaders/csv_reader.py:11-33`
   reads rows via `_read_csv(file, kwargs)` at `csv_reader.py:6-37,47-55`; for 
Excel,
   `ExcelReader.file_metadata()` at
   `superset/commands/database/uploaders/excel_reader.py:67-87` constructs
   `pd.ExcelFile(file)` and parses sheets; and for columnar/ZIP,
   `ColumnarReader.file_metadata()` at
   `superset/commands/database/uploaders/columnar_reader.py:88-99` iterates
   `_yield_files(file)` which reads inner files into `BytesIO` (lines 45-71). 
The metadata
   endpoint returns HTTP 200 with metadata instead of rejecting the file as too 
large,
   demonstrating that the configured `UPLOAD_MAX_FILE_SIZE_BYTES` limit is 
enforced only on
   the upload path and can be bypassed via the metadata path.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=e3b9f772366342c18ebee7e02ebb1ead&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=e3b9f772366342c18ebee7e02ebb1ead&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/database/uploaders/base.py
   **Line:** 212:218
   **Comment:**
        *Security: The new size limit is enforced only in 
`UploadCommand.validate()`, but the metadata endpoint 
(`/<pk>/upload/form_data`) does not use `UploadCommand` and still calls reader 
`file_metadata()` directly. That leaves a bypass where oversized files can 
still be parsed (including memory-heavy paths like zipped columnar metadata), 
so the intended protection against large uploads is incomplete. Apply the same 
`UPLOAD_MAX_FILE_SIZE_BYTES` check in the metadata flow (or centralize it in 
shared pre-validation used by both upload and metadata endpoints).
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40860&comment_hash=d9d3afd62cbd148e2ad3d353f3273975f116e4b0d2bfb1991bfede620ada6d7c&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40860&comment_hash=d9d3afd62cbd148e2ad3d353f3273975f116e4b0d2bfb1991bfede620ada6d7c&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to