sha174n commented on PR #40860:
URL: https://github.com/apache/superset/pull/40860#issuecomment-4699258825

   Nice consolidation: folding the columnar-only size check into a shared 
`validate_file_size` on the command (and the metadata endpoint), with a proper 
`413`, is a clear improvement. 121 tests pass. A couple of small things:
   
   **Lost the non-seekable fallback.** The previous columnar `_check_file_size` 
wrapped `tell()`/`seek()` in `try/except (AttributeError, OSError)` and skipped 
when the stream wasn't seekable. `_file_size_bytes` drops that, so a 
non-seekable upload stream would now raise and surface as a `500` rather than 
skipping the check. Werkzeug's `FileStorage` is seekable so it's unlikely in 
practice, but restoring the guard (or asserting the assumption) keeps the old 
robustness.
   
   **Complementary WSGI-level bound.** The check runs after Werkzeug has 
already spooled the full upload to a `SpooledTemporaryFile`, so it prevents the 
pandas/BytesIO read (the goal) but the bytes are already received. Pairing it 
with Flask `MAX_CONTENT_LENGTH` rejects oversized bodies at the WSGI layer 
before spooling, worth a mention in the config docs as the first line of 
defense.
   
   **Optional:** the columnar path lost its dedicated size test along with the 
old check; one test asserting an oversized parquet upload is rejected through 
the shared path would lock in the consolidation.
   
   Tiny: the description says `validate()` never checked size, but the columnar 
reader did, this PR consolidates it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to