rusackas commented on code in PR #40637:
URL: https://github.com/apache/superset/pull/40637#discussion_r3338099145
##########
superset/commands/database/uploaders/columnar_reader.py:
##########
@@ -89,6 +90,9 @@ def _yield_files(file: FileStorage) -> Generator[IO[bytes],
None, None]:
raise DatabaseUploadFailed(_("Not a valid ZIP file"))
try:
with ZipFile(file) as zip_file:
+ # guard against decompression bombs before reading entries,
+ # mirroring the importer path
+ check_is_safe_zip(zip_file)
# check if all file types are of the same extension
Review Comment:
Fixed — wrapped the `check_is_safe_zip()` call in a `try/except
SupersetException` block and re-raises as `DatabaseUploadFailed`. This keeps
the unsafe-ZIP rejection aligned with other upload validation errors (422
instead of 500). Tests updated to assert `DatabaseUploadFailed`.
##########
superset/daos/base.py:
##########
@@ -749,7 +750,10 @@ def list( # noqa: C901
else:
query = query.order_by(asc(column))
page = page
- page_size = max(page_size, 1)
+ # Clamp the page size to a sane range: at least 1, and no larger than
+ # the configured upper bound, to keep result sets bounded.
+ max_page_size = current_app.config.get("SQLALCHEMY_DAO_MAX_PAGE_SIZE",
1000)
+ page_size = min(max(page_size, 1), max_page_size)
Review Comment:
Valid concern, but the config key is documented as an int and operator
misconfiguration (string or ≤0) is an admin error, not a user error. Adding a
startup-time validation guard for this config would be a separate follow-up to
avoid scope creep here.
##########
superset/daos/base.py:
##########
@@ -749,7 +750,10 @@ def list( # noqa: C901
else:
query = query.order_by(asc(column))
page = page
- page_size = max(page_size, 1)
+ # Clamp the page size to a sane range: at least 1, and no larger than
+ # the configured upper bound, to keep result sets bounded.
+ max_page_size = current_app.config.get("SQLALCHEMY_DAO_MAX_PAGE_SIZE",
1000)
+ page_size = min(max(page_size, 1), max_page_size)
Review Comment:
Same as above — this is a pre-existing config validation gap, not introduced
by this PR. Follow-up candidate.
##########
superset/commands/database/uploaders/columnar_reader.py:
##########
@@ -89,6 +90,9 @@ def _yield_files(file: FileStorage) -> Generator[IO[bytes],
None, None]:
raise DatabaseUploadFailed(_("Not a valid ZIP file"))
try:
with ZipFile(file) as zip_file:
+ # guard against decompression bombs before reading entries,
+ # mirroring the importer path
+ check_is_safe_zip(zip_file)
# check if all file types are of the same extension
Review Comment:
Fixed — see the update to comment above. SupersetException from
check_is_safe_zip is now caught and re-raised as DatabaseUploadFailed.
##########
tests/unit_tests/commands/databases/columnar_reader_test.py:
##########
@@ -28,6 +28,7 @@
ColumnarReader,
ColumnarReaderOptions,
)
+from superset.exceptions import SupersetException
from tests.unit_tests.fixtures.common import create_columnar_file
Review Comment:
Removed the SupersetException import from the test file — it's no longer
needed after updating assertions to expect DatabaseUploadFailed.
##########
tests/unit_tests/commands/databases/columnar_reader_test.py:
##########
@@ -230,6 +231,40 @@ def test_columnar_reader_bad_zip():
assert str(ex.value) == "Not a valid ZIP file"
+def _make_high_ratio_zip() -> io.BytesIO:
+ """
+ Build a ZIP whose single entry has a very high decompression ratio,
+ well above the default ``ZIP_FILE_MAX_COMPRESS_RATIO`` threshold.
+ """
+ buffer = io.BytesIO()
+ with ZipFile(buffer, "w", ZIP_DEFLATED) as zip_file:
+ # A megabyte of zeros compresses to roughly a kilobyte, far exceeding
+ # the default 200:1 ratio guard.
+ zip_file.writestr("test.parquet", b"\x00" * (1024 * 1024))
+ buffer.seek(0)
+ return buffer
+
+
+def test_columnar_reader_unsafe_zip_rejected():
+ reader = ColumnarReader(
+ options=ColumnarReaderOptions(),
+ )
+ unsafe_zip = _make_high_ratio_zip()
+ with pytest.raises(SupersetException) as ex:
+ reader.file_to_dataframe(FileStorage(unsafe_zip, "test.zip"))
+ assert "compress ratio above allowed threshold" in str(ex.value)
+
+
+def test_columnar_reader_unsafe_zip_rejected_in_metadata():
+ reader = ColumnarReader(
+ options=ColumnarReaderOptions(),
+ )
+ unsafe_zip = _make_high_ratio_zip()
+ with pytest.raises(SupersetException) as ex:
+ reader.file_metadata(FileStorage(unsafe_zip, "test.zip"))
+ assert "compress ratio above allowed threshold" in str(ex.value)
Review Comment:
Updated — both test assertions now expect DatabaseUploadFailed with the same
error message check.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]