sha174n commented on PR #40860: URL: https://github.com/apache/superset/pull/40860#issuecomment-4699258825
Nice consolidation: folding the columnar-only size check into a shared `validate_file_size` on the command (and the metadata endpoint), with a proper `413`, is a clear improvement. 121 tests pass. A couple of small things: **Lost the non-seekable fallback.** The previous columnar `_check_file_size` wrapped `tell()`/`seek()` in `try/except (AttributeError, OSError)` and skipped when the stream wasn't seekable. `_file_size_bytes` drops that, so a non-seekable upload stream would now raise and surface as a `500` rather than skipping the check. Werkzeug's `FileStorage` is seekable so it's unlikely in practice, but restoring the guard (or asserting the assumption) keeps the old robustness. **Complementary WSGI-level bound.** The check runs after Werkzeug has already spooled the full upload to a `SpooledTemporaryFile`, so it prevents the pandas/BytesIO read (the goal) but the bytes are already received. Pairing it with Flask `MAX_CONTENT_LENGTH` rejects oversized bodies at the WSGI layer before spooling, worth a mention in the config docs as the first line of defense. **Optional:** the columnar path lost its dedicated size test along with the old check; one test asserting an oversized parquet upload is rejected through the shared path would lock in the consolidation. Tiny: the description says `validate()` never checked size, but the columnar reader did, this PR consolidates it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
