Re: [PR] fix(reports): export server-paginated table row limits [superset]

via GitHub Mon, 15 Jun 2026 23:38:11 -0700


codeant-ai-for-open-source[bot] commented on code in PR #41103:
URL: https://github.com/apache/superset/pull/41103#discussion_r3418614107



##########
superset/commands/report/execute.py:
##########
@@ -506,9 +509,88 @@ def _get_pdf(self) -> bytes:
 
         return pdf
 
+    def _get_chart_data_request_payload(
+        self,
+        result_format: ChartDataResultFormat,
+    ) -> dict[str, Any]:
+        """
+        Build the same POST payload shape used by frontend exports.
+        """
+        try:
+            query_context = 
json.loads(self._report_schedule.chart.query_context)
+        except (TypeError, json.JSONDecodeError) as ex:
+            raise ReportScheduleExecuteUnexpectedError(
+                "Chart has no valid query context saved."
+            ) from ex
+
+        if not isinstance(query_context, dict):
+            raise ReportScheduleExecuteUnexpectedError(
+                "Chart has no valid query context saved."
+            )
+
+        result_type = ChartDataResultType.POST_PROCESSED.value
+        force = bool(self._report_schedule.force_screenshot)
+        query_context["result_format"] = result_format.value
+        query_context["result_type"] = result_type
+        query_context["force"] = force
+
+        form_data = query_context.get("form_data")
+        if isinstance(form_data, dict):
+            form_data["result_format"] = result_format.value
+            form_data["result_type"] = result_type
+            form_data["force"] = force
+
+            if form_data.get("server_pagination"):
+                row_limit = form_data.get("row_limit") or 0
+                queries = query_context.get("queries")
+                if isinstance(queries, list):
+                    data_query_updated = False
+                    download_queries = []
+                    for query in queries:
+                        if isinstance(query, dict) and 
query.get("is_rowcount"):
+                            continue
+                        if isinstance(query, dict) and not data_query_updated:
+                            query = {
+                                **query,
+                                "row_limit": row_limit,
+                                "row_offset": 0,
+                            }
+                            data_query_updated = True
+                        download_queries.append(query)
+                    query_context["queries"] = download_queries
+
+        return query_context
+
+    @staticmethod
+    def _post_chart_data(
+        chart_url: str,
+        auth_cookies: Optional[dict[str, str]],
+        request_payload: dict[str, Any],
+    ) -> Optional[bytes]:
+        if not auth_cookies:
+            return None
+
+        cookie_str = ";".join([f"{key}={val}" for key, val in 
auth_cookies.items()])
+        request_body = urllib.parse.urlencode(
+            {"form_data": json.dumps(request_payload)}
+        ).encode("utf-8")
+        request = urllib.request.Request(
+            chart_url,
+            data=request_body,
+            headers={
+                "Content-Type": "application/x-www-form-urlencoded",
+                "Cookie": cookie_str,
+            },
+            method="POST",
+        )
+        response = urllib.request.build_opener().open(request)
+        content = response.read()
+        if response.getcode() != 200:
+            raise URLError(response.getcode())

Review Comment:
   **Suggestion:** The HTTP response object returned by 
`urllib.request.build_opener().open(request)` is never closed. In a 
long-running report worker this can leak sockets/file descriptors across many 
exports and eventually cause request failures. Wrap the open call in a context 
manager (`with ... as response`) so the connection is always released, 
including on exceptions. [resource leak]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Scheduled CSV report exports leak HTTP connections each execution.
   - ⚠️ Long-running report workers risk exhausting sockets/descriptors.
   - ⚠️ Subsequent chart data exports may fail with URLError.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure a scheduled report whose `report_format` is CSV so it uses the 
report
   execution command in `superset/commands/report/execute.py` and is run 
periodically by the
   scheduler (see `run()` method at lines 13–31, which constructs and runs
   `ReportScheduleStateMachine` at lines 28–30).
   
   2. When the schedule fires, `ReportScheduleStateMachine.run()` eventually 
enters
   `ReportSuccessState.next()` (class defined around line 1102; the `next()` 
method calls
   `self.send()` at lines 28–29 in the 1149–1188 block), which invokes `send()` 
at lines
   880–887.
   
   3. The `send()` method at lines 880–887 calls `_get_notification_content()` 
at line 886;
   for a chart report with `report_format == ReportDataFormat.CSV` the
   `_get_notification_content()` branch at lines 774–778 calls `csv_data =
   self._get_csv_data()`.
   
   4. `_get_csv_data()` at lines 592–610 builds the chart data request payload 
and calls
   `self._post_chart_data(...)` at lines 610–614, which performs the HTTP 
request in
   `_post_chart_data()` at lines 565–590 by executing `response =
   urllib.request.build_opener().open(request)` (line 586), reading `content =
   response.read()` (line 587), and returning without ever closing `response`, 
leaving the
   underlying HTTP connection/socket open until garbage collection. Repeating 
this scheduled
   CSV export many times in the long‑running worker will accumulate unclosed 
HTTPResponse
   objects and sockets, eventually exhausting file descriptors or connections 
and causing
   future exports to fail with networking errors.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=3e1b95000ef04070a9c08046aabb0300&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=3e1b95000ef04070a9c08046aabb0300&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/report/execute.py
   **Line:** 586:589
   **Comment:**
        *Resource Leak: The HTTP response object returned by 
`urllib.request.build_opener().open(request)` is never closed. In a 
long-running report worker this can leak sockets/file descriptors across many 
exports and eventually cause request failures. Wrap the open call in a context 
manager (`with ... as response`) so the connection is always released, 
including on exceptions.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41103&comment_hash=8fc59d935c47623c4f7605471d325d026fd85713c95add0a8e70cd9acaa70af6&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41103&comment_hash=8fc59d935c47623c4f7605471d325d026fd85713c95add0a8e70cd9acaa70af6&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(reports): export server-paginated table row limits [superset]

Reply via email to