Copilot commented on code in PR #40637:
URL: https://github.com/apache/superset/pull/40637#discussion_r3337903546
##########
superset/daos/base.py:
##########
@@ -749,7 +750,10 @@ def list( # noqa: C901
else:
query = query.order_by(asc(column))
page = page
- page_size = max(page_size, 1)
+ # Clamp the page size to a sane range: at least 1, and no larger than
+ # the configured upper bound, to keep result sets bounded.
+ max_page_size = current_app.config.get("SQLALCHEMY_DAO_MAX_PAGE_SIZE",
1000)
+ page_size = min(max(page_size, 1), max_page_size)
Review Comment:
`max_page_size` is pulled from config without type coercion/validation. If
an operator sets `SQLALCHEMY_DAO_MAX_PAGE_SIZE` to a non-int (e.g. a string) or
<= 0, `min(max(page_size, 1), max_page_size)` can raise a `TypeError` or
produce `page_size=0`, which breaks the existing “at least 1” contract. Coerce
to `int` and clamp the max to at least 1 before applying it.
##########
tests/unit_tests/commands/databases/columnar_reader_test.py:
##########
@@ -230,6 +231,40 @@ def test_columnar_reader_bad_zip():
assert str(ex.value) == "Not a valid ZIP file"
+def _make_high_ratio_zip() -> io.BytesIO:
+ """
+ Build a ZIP whose single entry has a very high decompression ratio,
+ well above the default ``ZIP_FILE_MAX_COMPRESS_RATIO`` threshold.
+ """
+ buffer = io.BytesIO()
+ with ZipFile(buffer, "w", ZIP_DEFLATED) as zip_file:
+ # A megabyte of zeros compresses to roughly a kilobyte, far exceeding
+ # the default 200:1 ratio guard.
+ zip_file.writestr("test.parquet", b"\x00" * (1024 * 1024))
+ buffer.seek(0)
+ return buffer
+
+
+def test_columnar_reader_unsafe_zip_rejected():
+ reader = ColumnarReader(
+ options=ColumnarReaderOptions(),
+ )
+ unsafe_zip = _make_high_ratio_zip()
+ with pytest.raises(SupersetException) as ex:
+ reader.file_to_dataframe(FileStorage(unsafe_zip, "test.zip"))
+ assert "compress ratio above allowed threshold" in str(ex.value)
+
+
+def test_columnar_reader_unsafe_zip_rejected_in_metadata():
+ reader = ColumnarReader(
+ options=ColumnarReaderOptions(),
+ )
+ unsafe_zip = _make_high_ratio_zip()
+ with pytest.raises(SupersetException) as ex:
+ reader.file_metadata(FileStorage(unsafe_zip, "test.zip"))
+ assert "compress ratio above allowed threshold" in str(ex.value)
Review Comment:
Unsafe ZIPs should be rejected as a file upload error
(`DatabaseUploadFailed`, status 422) rather than a generic `SupersetException`
(status 500). Updating the assertions to expect `DatabaseUploadFailed` keeps
the tests aligned with other columnar-reader error cases and with the API’s
error semantics.
##########
superset/commands/database/uploaders/columnar_reader.py:
##########
@@ -89,6 +90,9 @@ def _yield_files(file: FileStorage) -> Generator[IO[bytes],
None, None]:
raise DatabaseUploadFailed(_("Not a valid ZIP file"))
try:
with ZipFile(file) as zip_file:
+ # guard against decompression bombs before reading entries,
+ # mirroring the importer path
+ check_is_safe_zip(zip_file)
# check if all file types are of the same extension
Review Comment:
`check_is_safe_zip()` raises `SupersetException` (default status=500).
Letting that bubble out of the upload reader turns a client-side validation
failure (unsafe ZIP) into a 500 response, unlike other parsing errors which
raise `DatabaseUploadFailed` (422). Catch and wrap the exception so the upload
endpoint reports a 4xx and stays consistent with the rest of the upload error
handling.
##########
tests/unit_tests/commands/databases/columnar_reader_test.py:
##########
@@ -28,6 +28,7 @@
ColumnarReader,
ColumnarReaderOptions,
)
+from superset.exceptions import SupersetException
from tests.unit_tests.fixtures.common import create_columnar_file
Review Comment:
The test imports/uses `SupersetException`, but the columnar upload readers
generally surface user file/validation problems as `DatabaseUploadFailed`
(422). After wrapping unsafe ZIPs as `DatabaseUploadFailed`, this import
becomes unnecessary and will fail linting as unused.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]