codeant-ai-for-open-source[bot] commented on code in PR #40860:
URL: https://github.com/apache/superset/pull/40860#discussion_r3408416595
##########
superset/commands/database/uploaders/base.py:
##########
@@ -188,6 +190,43 @@ def run(self) -> None:
sqla_table.fetch_metadata()
+ @staticmethod
+ def _file_size_bytes(file: Any) -> Optional[int]:
+ """
+ Return the size of an uploaded file without consuming its stream.
+
+ Returns ``None`` when the stream is not seekable, in which case the
+ size cannot be determined cheaply and the size check is skipped in
+ favour of downstream guards.
+ """
+ stream = getattr(file, "stream", file)
+ try:
+ position = stream.tell()
+ stream.seek(0, 2) # seek to end
+ size = stream.tell()
+ stream.seek(position) # restore the original position
+ except (AttributeError, OSError):
+ return None
+ return size
+
+ @classmethod
+ def validate_file_size(cls, file: Any) -> None:
+ """
+ Reject a file whose size exceeds ``UPLOAD_MAX_FILE_SIZE_BYTES``.
+
+ Shared by the upload command and the metadata endpoint so oversized
+ files are rejected before their contents are read into memory,
+ regardless of which path is used.
+
+ :raises DatabaseUploadFileTooLarge: if the file is larger than the
limit
+ """
+ max_file_size = current_app.config.get("UPLOAD_MAX_FILE_SIZE_BYTES")
+ if max_file_size is None or file is None:
+ return
+ size = cls._file_size_bytes(file)
+ if size is not None and size > max_file_size:
+ raise DatabaseUploadFileTooLarge()
Review Comment:
**Suggestion:** This check always enforces the limit whenever
`UPLOAD_MAX_FILE_SIZE_BYTES` is non-`None`, but the current default config is
non-`None` (100 MB), so the feature is no longer opt-in and changes existing
behavior by rejecting large uploads by default. To preserve backward
compatibility, either default the config to `None` or gate enforcement behind
explicit user configuration. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Large database uploads >100 MB now fail by default.
- ⚠️ Existing CSV/Excel upload workflows may regress unexpectedly.
- ⚠️ Behavior contradicts documented opt-in configuration semantics.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Observe the default config in `superset/config.py:1146-1148` (via Grep):
comments state
"Set to ``None`` to disable the check. Defaults to 100 MB." followed by
`UPLOAD_MAX_FILE_SIZE_BYTES: int | None = 100 * 1024 * 1024`, meaning a
non-`None` 100 MB
limit is enabled by default.
2. In `superset/commands/database/uploaders/base.py:223-228`,
`UploadCommand.validate_file_size()` reads `max_file_size =
current_app.config.get("UPLOAD_MAX_FILE_SIZE_BYTES")` and immediately
enforces it whenever
it is non-`None` (`if max_file_size is None or file is None: return`),
raising
`DatabaseUploadFileTooLarge()` when `size > max_file_size`.
3. The upload metadata endpoint `DatabaseRestApi.upload_metadata` in
`superset/databases/api.py:25-79` loads `parameters["file"]` and then calls
`UploadCommand.validate_file_size(parameters["file"])` at line 70, so any
POST to
`/api/v1/database/upload_metadata/` (resource `DatabaseRestApi`,
`@expose("/upload_metadata/", methods=("POST",))`) with a file >100 MB will
now raise
`DatabaseUploadFileTooLarge` under default config.
4. The main upload endpoint `DatabaseRestApi.upload` in
`superset/databases/api.py:89-18`
constructs `UploadCommand(...).run()` at lines 9-15 of the second snippet,
whose
`validate()` method (in `UploadCommand.validate` at
`superset/commands/database/uploaders/base.py:230-239`) calls
`self.validate_file_size(self._file)`, so a POST to
`/api/v1/database/<pk>/upload/` with a
file >100 MB also raises `DatabaseUploadFileTooLarge` by default. Since
previously there
was no size check, this PR changes behavior from "no limit" to "100 MB limit
by default",
contradicting the PR's stated "opt-in" design and potentially breaking
existing
large-upload workflows unless the operator explicitly sets
`UPLOAD_MAX_FILE_SIZE_BYTES =
None`.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=083327362f7a4e0eacd59b048ec410ca&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=083327362f7a4e0eacd59b048ec410ca&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/database/uploaders/base.py
**Line:** 223:228
**Comment:**
*Logic Error: This check always enforces the limit whenever
`UPLOAD_MAX_FILE_SIZE_BYTES` is non-`None`, but the current default config is
non-`None` (100 MB), so the feature is no longer opt-in and changes existing
behavior by rejecting large uploads by default. To preserve backward
compatibility, either default the config to `None` or gate enforcement behind
explicit user configuration.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40860&comment_hash=ffbe0b239f5a0636fba5aa89b83a08d609337a4c3c15fa5abcb012bae28b4fc8&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40860&comment_hash=ffbe0b239f5a0636fba5aa89b83a08d609337a4c3c15fa5abcb012bae28b4fc8&reaction=dislike'>👎</a>
##########
superset/databases/api.py:
##########
@@ -1718,6 +1718,15 @@ def upload_metadata(self) -> Response:
$ref: '#/components/responses/401'
404:
$ref: '#/components/responses/404'
+ 413:
+ description: Payload too large, file exceeds the maximum allowed
size
+ content:
+ application/json:
+ schema:
+ type: object
+ properties:
+ message:
+ type: string
Review Comment:
**Suggestion:** The new 413 OpenAPI schema is incorrect: this endpoint
returns Superset's standard error envelope (`errors` array), not a top-level
`message` field. Keeping this schema will generate wrong client contracts and
break consumers that rely on the documented response shape. [api mismatch]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ OpenAPI for upload_metadata 413 misstates JSON response shape.
- ⚠️ Generated clients may expect `message` instead of `errors[]`.
- ⚠️ Client-side error parsing for 413s can fail or misbehave.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Inspect the OpenAPI-style docstring for `DatabaseRestApi.upload_metadata`
in
`superset/databases/api.py:25-63`. The new 413 response block at lines
1721-1729 documents
`413` with `content.application/json.schema.type: object` and a top-level
`properties.message: string`.
2. Follow the implementation of `upload_metadata` in
`superset/databases/api.py:64-79`.
After loading `parameters` from `UploadFileMetadataPostSchema`, it calls
`UploadCommand.validate_file_size(parameters["file"])` at line 70, which
raises
`DatabaseUploadFileTooLarge` when the file exceeds
`UPLOAD_MAX_FILE_SIZE_BYTES` (see
`UploadCommand.validate_file_size` in
`superset/commands/database/uploaders/base.py:212-228`).
3. `DatabaseUploadFileTooLarge` is a `CommandException` with `status = 413`
defined in
`superset/commands/database/exceptions.py:42-45`. Global error handling in
`superset/views/error_handling.py:184-212` registers `show_command_errors`
as the Flask
`@app.errorhandler(CommandException)`, which wraps the exception into a
`SupersetError`
and calls `json_error_response([...], status=ex.status)`.
4. `json_error_response` in `superset/views/error_handling.py:68-86`
serializes a list of
`SupersetError` objects as a JSON payload with an `errors` array
(`payload["errors"] =
[dataclasses.asdict(error) ...]`). Thus, a POST to
`/api/v1/database/upload_metadata/`
with an oversized file produces a `413` with body `{"errors":
[...SupersetError
fields...]}`, while the OpenAPI doc advertises a `{"message": "..."}`
object, causing an
API contract mismatch for clients generated from the spec.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=e2838936638b49a0889221b5f530395a&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=e2838936638b49a0889221b5f530395a&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/databases/api.py
**Line:** 1721:1729
**Comment:**
*Api Mismatch: The new 413 OpenAPI schema is incorrect: this endpoint
returns Superset's standard error envelope (`errors` array), not a top-level
`message` field. Keeping this schema will generate wrong client contracts and
break consumers that rely on the documented response shape.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40860&comment_hash=c51d4e659cacb9eb8928f8fcda83172763922390a3c7bfd81965767087f3689d&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40860&comment_hash=c51d4e659cacb9eb8928f8fcda83172763922390a3c7bfd81965767087f3689d&reaction=dislike'>👎</a>
##########
superset/databases/api.py:
##########
@@ -1777,6 +1787,15 @@ def upload(self, pk: int) -> Response:
$ref: '#/components/responses/401'
404:
$ref: '#/components/responses/404'
+ 413:
+ description: Payload too large, file exceeds the maximum allowed
size
+ content:
+ application/json:
+ schema:
+ type: object
+ properties:
+ message:
+ type: string
Review Comment:
**Suggestion:** The 413 schema added to the upload endpoint also documents a
top-level `message` field, but runtime errors from `CommandException` are
serialized as an `errors` list. This creates a second contract mismatch for
generated SDKs and client-side error parsing. [api mismatch]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Upload endpoint 413 documentation conflicts with real JSON envelope.
- ⚠️ SDKs generated from OpenAPI misrepresent 413 error structure.
- ⚠️ Frontend handling of 413 upload errors may break or misparse.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Inspect the OpenAPI-style docstring for `DatabaseRestApi.upload` in
`superset/databases/api.py:90-134`. The newly added 413 block at lines
1790-1798 documents
a JSON `schema` with `type: object` and a single top-level
`properties.message: string`.
2. Follow the implementation of `upload` in
`superset/databases/api.py:135-18`. The
endpoint loads `parameters` from `UploadPostSchema`, chooses a reader, and
then
instantiates `UploadCommand(pk, parameters["table_name"], parameters["file"],
parameters.get("schema"), reader).run()` at lines 9-15 of the continuation
(file snippet
starting at 1809).
3. Inside `UploadCommand.run`
(`superset/commands/database/uploaders/base.py:158-171`),
`self.validate()` is called first. `UploadCommand.validate` at
`superset/commands/database/uploaders/base.py:230-239` concludes with
`self.validate_file_size(self._file)`, which calls `validate_file_size` and
thus raises
`DatabaseUploadFileTooLarge` when the file exceeds
`UPLOAD_MAX_FILE_SIZE_BYTES` (see
`validate_file_size` at 212-228 and `DatabaseUploadFileTooLarge` definition
in
`superset/commands/database/exceptions.py:42-45`).
4. As in suggestion 2, any `DatabaseUploadFileTooLarge` propagating from
`upload` is
handled by the global `CommandException` handler in
`superset/views/error_handling.py:184-212`, which returns a JSON payload
with an `errors`
array built by `json_error_response`. Therefore, a POST to
`/api/v1/database/<pk>/upload/`
with an oversized file produces a `413` JSON body shaped like `{"errors":
[...]}`, while
the OpenAPI doc for this endpoint claims the body is `{"message": "..."}`,
again creating
a concrete mismatch between documented and actual 413 responses.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=19baee0ff0ef40d58ba81cc9a9506548&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=19baee0ff0ef40d58ba81cc9a9506548&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/databases/api.py
**Line:** 1790:1798
**Comment:**
*Api Mismatch: The 413 schema added to the upload endpoint also
documents a top-level `message` field, but runtime errors from
`CommandException` are serialized as an `errors` list. This creates a second
contract mismatch for generated SDKs and client-side error parsing.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40860&comment_hash=2b3748d4807fe6618d6b393bcae788c2a0c696192504db6c206dab5a20eda81a&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40860&comment_hash=2b3748d4807fe6618d6b393bcae788c2a0c696192504db6c206dab5a20eda81a&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]