codeant-ai-for-open-source[bot] commented on code in PR #40860:
URL: https://github.com/apache/superset/pull/40860#discussion_r3408416595


##########
superset/commands/database/uploaders/base.py:
##########
@@ -188,6 +190,43 @@ def run(self) -> None:
 
         sqla_table.fetch_metadata()
 
+    @staticmethod
+    def _file_size_bytes(file: Any) -> Optional[int]:
+        """
+        Return the size of an uploaded file without consuming its stream.
+
+        Returns ``None`` when the stream is not seekable, in which case the
+        size cannot be determined cheaply and the size check is skipped in
+        favour of downstream guards.
+        """
+        stream = getattr(file, "stream", file)
+        try:
+            position = stream.tell()
+            stream.seek(0, 2)  # seek to end
+            size = stream.tell()
+            stream.seek(position)  # restore the original position
+        except (AttributeError, OSError):
+            return None
+        return size
+
+    @classmethod
+    def validate_file_size(cls, file: Any) -> None:
+        """
+        Reject a file whose size exceeds ``UPLOAD_MAX_FILE_SIZE_BYTES``.
+
+        Shared by the upload command and the metadata endpoint so oversized
+        files are rejected before their contents are read into memory,
+        regardless of which path is used.
+
+        :raises DatabaseUploadFileTooLarge: if the file is larger than the 
limit
+        """
+        max_file_size = current_app.config.get("UPLOAD_MAX_FILE_SIZE_BYTES")
+        if max_file_size is None or file is None:
+            return
+        size = cls._file_size_bytes(file)
+        if size is not None and size > max_file_size:
+            raise DatabaseUploadFileTooLarge()

Review Comment:
   **Suggestion:** This check always enforces the limit whenever 
`UPLOAD_MAX_FILE_SIZE_BYTES` is non-`None`, but the current default config is 
non-`None` (100 MB), so the feature is no longer opt-in and changes existing 
behavior by rejecting large uploads by default. To preserve backward 
compatibility, either default the config to `None` or gate enforcement behind 
explicit user configuration. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Large database uploads >100 MB now fail by default.
   - ⚠️ Existing CSV/Excel upload workflows may regress unexpectedly.
   - ⚠️ Behavior contradicts documented opt-in configuration semantics.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Observe the default config in `superset/config.py:1146-1148` (via Grep): 
comments state
   "Set to ``None`` to disable the check. Defaults to 100 MB." followed by
   `UPLOAD_MAX_FILE_SIZE_BYTES: int | None = 100 * 1024 * 1024`, meaning a 
non-`None` 100 MB
   limit is enabled by default.
   
   2. In `superset/commands/database/uploaders/base.py:223-228`,
   `UploadCommand.validate_file_size()` reads `max_file_size =
   current_app.config.get("UPLOAD_MAX_FILE_SIZE_BYTES")` and immediately 
enforces it whenever
   it is non-`None` (`if max_file_size is None or file is None: return`), 
raising
   `DatabaseUploadFileTooLarge()` when `size > max_file_size`.
   
   3. The upload metadata endpoint `DatabaseRestApi.upload_metadata` in
   `superset/databases/api.py:25-79` loads `parameters["file"]` and then calls
   `UploadCommand.validate_file_size(parameters["file"])` at line 70, so any 
POST to
   `/api/v1/database/upload_metadata/` (resource `DatabaseRestApi`,
   `@expose("/upload_metadata/", methods=("POST",))`) with a file >100 MB will 
now raise
   `DatabaseUploadFileTooLarge` under default config.
   
   4. The main upload endpoint `DatabaseRestApi.upload` in 
`superset/databases/api.py:89-18`
   constructs `UploadCommand(...).run()` at lines 9-15 of the second snippet, 
whose
   `validate()` method (in `UploadCommand.validate` at
   `superset/commands/database/uploaders/base.py:230-239`) calls
   `self.validate_file_size(self._file)`, so a POST to 
`/api/v1/database/<pk>/upload/` with a
   file >100 MB also raises `DatabaseUploadFileTooLarge` by default. Since 
previously there
   was no size check, this PR changes behavior from "no limit" to "100 MB limit 
by default",
   contradicting the PR's stated "opt-in" design and potentially breaking 
existing
   large-upload workflows unless the operator explicitly sets 
`UPLOAD_MAX_FILE_SIZE_BYTES =
   None`.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=083327362f7a4e0eacd59b048ec410ca&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=083327362f7a4e0eacd59b048ec410ca&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/database/uploaders/base.py
   **Line:** 223:228
   **Comment:**
        *Logic Error: This check always enforces the limit whenever 
`UPLOAD_MAX_FILE_SIZE_BYTES` is non-`None`, but the current default config is 
non-`None` (100 MB), so the feature is no longer opt-in and changes existing 
behavior by rejecting large uploads by default. To preserve backward 
compatibility, either default the config to `None` or gate enforcement behind 
explicit user configuration.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40860&comment_hash=ffbe0b239f5a0636fba5aa89b83a08d609337a4c3c15fa5abcb012bae28b4fc8&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40860&comment_hash=ffbe0b239f5a0636fba5aa89b83a08d609337a4c3c15fa5abcb012bae28b4fc8&reaction=dislike'>👎</a>



##########
superset/databases/api.py:
##########
@@ -1718,6 +1718,15 @@ def upload_metadata(self) -> Response:
               $ref: '#/components/responses/401'
             404:
               $ref: '#/components/responses/404'
+            413:
+              description: Payload too large, file exceeds the maximum allowed 
size
+              content:
+                application/json:
+                  schema:
+                    type: object
+                    properties:
+                      message:
+                        type: string

Review Comment:
   **Suggestion:** The new 413 OpenAPI schema is incorrect: this endpoint 
returns Superset's standard error envelope (`errors` array), not a top-level 
`message` field. Keeping this schema will generate wrong client contracts and 
break consumers that rely on the documented response shape. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ OpenAPI for upload_metadata 413 misstates JSON response shape.
   - ⚠️ Generated clients may expect `message` instead of `errors[]`.
   - ⚠️ Client-side error parsing for 413s can fail or misbehave.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Inspect the OpenAPI-style docstring for `DatabaseRestApi.upload_metadata` 
in
   `superset/databases/api.py:25-63`. The new 413 response block at lines 
1721-1729 documents
   `413` with `content.application/json.schema.type: object` and a top-level
   `properties.message: string`.
   
   2. Follow the implementation of `upload_metadata` in 
`superset/databases/api.py:64-79`.
   After loading `parameters` from `UploadFileMetadataPostSchema`, it calls
   `UploadCommand.validate_file_size(parameters["file"])` at line 70, which 
raises
   `DatabaseUploadFileTooLarge` when the file exceeds 
`UPLOAD_MAX_FILE_SIZE_BYTES` (see
   `UploadCommand.validate_file_size` in
   `superset/commands/database/uploaders/base.py:212-228`).
   
   3. `DatabaseUploadFileTooLarge` is a `CommandException` with `status = 413` 
defined in
   `superset/commands/database/exceptions.py:42-45`. Global error handling in
   `superset/views/error_handling.py:184-212` registers `show_command_errors` 
as the Flask
   `@app.errorhandler(CommandException)`, which wraps the exception into a 
`SupersetError`
   and calls `json_error_response([...], status=ex.status)`.
   
   4. `json_error_response` in `superset/views/error_handling.py:68-86` 
serializes a list of
   `SupersetError` objects as a JSON payload with an `errors` array 
(`payload["errors"] =
   [dataclasses.asdict(error) ...]`). Thus, a POST to 
`/api/v1/database/upload_metadata/`
   with an oversized file produces a `413` with body `{"errors": 
[...SupersetError
   fields...]}`, while the OpenAPI doc advertises a `{"message": "..."}` 
object, causing an
   API contract mismatch for clients generated from the spec.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=e2838936638b49a0889221b5f530395a&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=e2838936638b49a0889221b5f530395a&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/databases/api.py
   **Line:** 1721:1729
   **Comment:**
        *Api Mismatch: The new 413 OpenAPI schema is incorrect: this endpoint 
returns Superset's standard error envelope (`errors` array), not a top-level 
`message` field. Keeping this schema will generate wrong client contracts and 
break consumers that rely on the documented response shape.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40860&comment_hash=c51d4e659cacb9eb8928f8fcda83172763922390a3c7bfd81965767087f3689d&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40860&comment_hash=c51d4e659cacb9eb8928f8fcda83172763922390a3c7bfd81965767087f3689d&reaction=dislike'>👎</a>



##########
superset/databases/api.py:
##########
@@ -1777,6 +1787,15 @@ def upload(self, pk: int) -> Response:
               $ref: '#/components/responses/401'
             404:
               $ref: '#/components/responses/404'
+            413:
+              description: Payload too large, file exceeds the maximum allowed 
size
+              content:
+                application/json:
+                  schema:
+                    type: object
+                    properties:
+                      message:
+                        type: string

Review Comment:
   **Suggestion:** The 413 schema added to the upload endpoint also documents a 
top-level `message` field, but runtime errors from `CommandException` are 
serialized as an `errors` list. This creates a second contract mismatch for 
generated SDKs and client-side error parsing. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Upload endpoint 413 documentation conflicts with real JSON envelope.
   - ⚠️ SDKs generated from OpenAPI misrepresent 413 error structure.
   - ⚠️ Frontend handling of 413 upload errors may break or misparse.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Inspect the OpenAPI-style docstring for `DatabaseRestApi.upload` in
   `superset/databases/api.py:90-134`. The newly added 413 block at lines 
1790-1798 documents
   a JSON `schema` with `type: object` and a single top-level 
`properties.message: string`.
   
   2. Follow the implementation of `upload` in 
`superset/databases/api.py:135-18`. The
   endpoint loads `parameters` from `UploadPostSchema`, chooses a reader, and 
then
   instantiates `UploadCommand(pk, parameters["table_name"], parameters["file"],
   parameters.get("schema"), reader).run()` at lines 9-15 of the continuation 
(file snippet
   starting at 1809).
   
   3. Inside `UploadCommand.run` 
(`superset/commands/database/uploaders/base.py:158-171`),
   `self.validate()` is called first. `UploadCommand.validate` at
   `superset/commands/database/uploaders/base.py:230-239` concludes with
   `self.validate_file_size(self._file)`, which calls `validate_file_size` and 
thus raises
   `DatabaseUploadFileTooLarge` when the file exceeds 
`UPLOAD_MAX_FILE_SIZE_BYTES` (see
   `validate_file_size` at 212-228 and `DatabaseUploadFileTooLarge` definition 
in
   `superset/commands/database/exceptions.py:42-45`).
   
   4. As in suggestion 2, any `DatabaseUploadFileTooLarge` propagating from 
`upload` is
   handled by the global `CommandException` handler in
   `superset/views/error_handling.py:184-212`, which returns a JSON payload 
with an `errors`
   array built by `json_error_response`. Therefore, a POST to 
`/api/v1/database/<pk>/upload/`
   with an oversized file produces a `413` JSON body shaped like `{"errors": 
[...]}`, while
   the OpenAPI doc for this endpoint claims the body is `{"message": "..."}`, 
again creating
   a concrete mismatch between documented and actual 413 responses.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=19baee0ff0ef40d58ba81cc9a9506548&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=19baee0ff0ef40d58ba81cc9a9506548&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/databases/api.py
   **Line:** 1790:1798
   **Comment:**
        *Api Mismatch: The 413 schema added to the upload endpoint also 
documents a top-level `message` field, but runtime errors from 
`CommandException` are serialized as an `errors` list. This creates a second 
contract mismatch for generated SDKs and client-side error parsing.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40860&comment_hash=2b3748d4807fe6618d6b393bcae788c2a0c696192504db6c206dab5a20eda81a&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40860&comment_hash=2b3748d4807fe6618d6b393bcae788c2a0c696192504db6c206dab5a20eda81a&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to