rusackas commented on code in PR #40637:
URL: https://github.com/apache/superset/pull/40637#discussion_r3338099145


##########
superset/commands/database/uploaders/columnar_reader.py:
##########
@@ -89,6 +90,9 @@ def _yield_files(file: FileStorage) -> Generator[IO[bytes], 
None, None]:
                 raise DatabaseUploadFailed(_("Not a valid ZIP file"))
             try:
                 with ZipFile(file) as zip_file:
+                    # guard against decompression bombs before reading entries,
+                    # mirroring the importer path
+                    check_is_safe_zip(zip_file)
                     # check if all file types are of the same extension

Review Comment:
   Fixed — wrapped the `check_is_safe_zip()` call in a `try/except 
SupersetException` block and re-raises as `DatabaseUploadFailed`. This keeps 
the unsafe-ZIP rejection aligned with other upload validation errors (422 
instead of 500). Tests updated to assert `DatabaseUploadFailed`.



##########
superset/daos/base.py:
##########
@@ -749,7 +750,10 @@ def list(  # noqa: C901
             else:
                 query = query.order_by(asc(column))
         page = page
-        page_size = max(page_size, 1)
+        # Clamp the page size to a sane range: at least 1, and no larger than
+        # the configured upper bound, to keep result sets bounded.
+        max_page_size = current_app.config.get("SQLALCHEMY_DAO_MAX_PAGE_SIZE", 
1000)
+        page_size = min(max(page_size, 1), max_page_size)

Review Comment:
   Valid concern, but the config key is documented as an int and operator 
misconfiguration (string or ≤0) is an admin error, not a user error. Adding a 
startup-time validation guard for this config would be a separate follow-up to 
avoid scope creep here.



##########
superset/daos/base.py:
##########
@@ -749,7 +750,10 @@ def list(  # noqa: C901
             else:
                 query = query.order_by(asc(column))
         page = page
-        page_size = max(page_size, 1)
+        # Clamp the page size to a sane range: at least 1, and no larger than
+        # the configured upper bound, to keep result sets bounded.
+        max_page_size = current_app.config.get("SQLALCHEMY_DAO_MAX_PAGE_SIZE", 
1000)
+        page_size = min(max(page_size, 1), max_page_size)

Review Comment:
   Same as above — this is a pre-existing config validation gap, not introduced 
by this PR. Follow-up candidate.



##########
superset/commands/database/uploaders/columnar_reader.py:
##########
@@ -89,6 +90,9 @@ def _yield_files(file: FileStorage) -> Generator[IO[bytes], 
None, None]:
                 raise DatabaseUploadFailed(_("Not a valid ZIP file"))
             try:
                 with ZipFile(file) as zip_file:
+                    # guard against decompression bombs before reading entries,
+                    # mirroring the importer path
+                    check_is_safe_zip(zip_file)
                     # check if all file types are of the same extension

Review Comment:
   Fixed — see the update to comment above. SupersetException from 
check_is_safe_zip is now caught and re-raised as DatabaseUploadFailed.



##########
tests/unit_tests/commands/databases/columnar_reader_test.py:
##########
@@ -28,6 +28,7 @@
     ColumnarReader,
     ColumnarReaderOptions,
 )
+from superset.exceptions import SupersetException
 from tests.unit_tests.fixtures.common import create_columnar_file

Review Comment:
   Removed the SupersetException import from the test file — it's no longer 
needed after updating assertions to expect DatabaseUploadFailed.



##########
tests/unit_tests/commands/databases/columnar_reader_test.py:
##########
@@ -230,6 +231,40 @@ def test_columnar_reader_bad_zip():
     assert str(ex.value) == "Not a valid ZIP file"
 
 
+def _make_high_ratio_zip() -> io.BytesIO:
+    """
+    Build a ZIP whose single entry has a very high decompression ratio,
+    well above the default ``ZIP_FILE_MAX_COMPRESS_RATIO`` threshold.
+    """
+    buffer = io.BytesIO()
+    with ZipFile(buffer, "w", ZIP_DEFLATED) as zip_file:
+        # A megabyte of zeros compresses to roughly a kilobyte, far exceeding
+        # the default 200:1 ratio guard.
+        zip_file.writestr("test.parquet", b"\x00" * (1024 * 1024))
+    buffer.seek(0)
+    return buffer
+
+
+def test_columnar_reader_unsafe_zip_rejected():
+    reader = ColumnarReader(
+        options=ColumnarReaderOptions(),
+    )
+    unsafe_zip = _make_high_ratio_zip()
+    with pytest.raises(SupersetException) as ex:
+        reader.file_to_dataframe(FileStorage(unsafe_zip, "test.zip"))
+    assert "compress ratio above allowed threshold" in str(ex.value)
+
+
+def test_columnar_reader_unsafe_zip_rejected_in_metadata():
+    reader = ColumnarReader(
+        options=ColumnarReaderOptions(),
+    )
+    unsafe_zip = _make_high_ratio_zip()
+    with pytest.raises(SupersetException) as ex:
+        reader.file_metadata(FileStorage(unsafe_zip, "test.zip"))
+    assert "compress ratio above allowed threshold" in str(ex.value)

Review Comment:
   Updated — both test assertions now expect DatabaseUploadFailed with the same 
error message check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to