codeant-ai-for-open-source[bot] commented on code in PR #40331:
URL: https://github.com/apache/superset/pull/40331#discussion_r3281242806
##########
superset/commands/chart/data/streaming_export_command.py:
##########
@@ -66,7 +69,29 @@ def _get_sql_and_database(self) -> tuple[str, Any, str |
None, str | None]:
# Note: datasource should already be attached to a session from
query_context
datasource = self._query_context.datasource
query_obj = self._query_context.queries[0]
- sql_query = datasource.get_query_str(query_obj.to_dict())
+ query_dict = query_obj.to_dict()
+
+ # When ALLOW_FULL_CSV_EXPORT is enabled, raise the row limit so a
+ # "full" export is not silently capped at SQL_MAX_ROW. The ceiling is
+ # TABLE_VIZ_MAX_ROW_SERVER (a bounded, predictable maximum) rather than
+ # truly unlimited.
+ if is_feature_enabled("ALLOW_FULL_CSV_EXPORT"):
+ query_dict["row_limit"] = app.config["TABLE_VIZ_MAX_ROW_SERVER"]
Review Comment:
**Suggestion:** The row-limit override is applied for every streaming CSV
export whenever the feature flag is enabled, not just for explicit "full CSV"
exports. This makes regular CSV exports unexpectedly bypass `SQL_MAX_ROW` and
can return much larger datasets than requested. Gate this override on an
explicit full-export signal from the request (or only honor the caller-provided
`row_limit`) so normal CSV exports keep their intended cap. [incorrect
condition logic]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Dashboard "Export to .CSV" ignores chart-configured row_limit.
- ❌ Streaming CSV uses TABLE_VIZ_MAX_ROW_SERVER for all exports.
- ⚠️ Large unintended exports increase database and network load.
- ⚠️ Explore "Export to .CSV" streaming also over-returns rows.
- ⚠️ Feature flag semantics leak into non-full export behavior.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Enable the `ALLOW_FULL_CSV_EXPORT` feature flag in Superset configuration
so
`superset.is_feature_enabled("ALLOW_FULL_CSV_EXPORT")` returns True, and
ensure
`TABLE_VIZ_MAX_ROW_SERVER` is significantly larger than the typical chart
`row_limit`
(e.g., 1_000_000 vs 10_000).
2. On a dashboard with a large table chart, configure the chart's
`row_limit` to a modest
value (e.g., 10_000) and save. This `row_limit` is part of the chart form
data used in
`exportTable` in
`superset-frontend/src/dashboard/components/gridComponents/Chart/Chart.tsx`
lines 66–105,
where the non-full path keeps `exportFormData = formData`.
3. With the chart loaded and `queriesResponse` reflecting a large underlying
dataset
(rowcount ≥ `CSV_STREAMING_ROW_THRESHOLD`), click the regular "Export to
.CSV" menu item
(not "Export to full .CSV") in the dashboard header. This invokes
`exportCSV` →
`exportTable('csv', false)` in `Chart.tsx` lines 178–183, which:
- Leaves `exportFormData.row_limit` at the chart's configured limit
(e.g., 10_000),
- Computes `actualRowCount` from `queriesResponse` and, when it exceeds
`streamingThreshold`, sets `shouldUseStreaming = true`, triggering
`exportChart` with
`resultType: 'full', resultFormat: 'csv'` and an `onStartStreamingExport`
callback.
4. The client's streaming export handler posts the chart form data (including
`row_limit=10_000`) to `/api/v1/chart/data` handled by
`ChartDataRestApi.data` in
`superset/charts/data/api.py` lines 115–163, which builds a `QueryContext`
via
`_create_query_context_from_form` and runs `ChartDataCommand`. When the
result is CSV and
large, `_send_chart_response` (lines 257–236 of the same file) calls
`_should_use_streaming` and, since `actual_row_count >= threshold`,
delegates to
`_create_streaming_csv_response` (lines 230–260), which constructs
`StreamingCSVExportCommand(query_context, chunk_size)`.
5. Inside `StreamingCSVExportCommand._get_sql_and_database` in
`superset/commands/chart/data/streaming_export_command.py` lines 61–99, the
code converts
the first query object to a dict (`query_dict = query_obj.to_dict()`), then
unconditionally executes:
- `if is_feature_enabled("ALLOW_FULL_CSV_EXPORT"):`
` query_dict["row_limit"] = app.config["TABLE_VIZ_MAX_ROW_SERVER"]`
Since the flag is enabled, the chart's intended `row_limit` (10_000) is
overwritten
with `TABLE_VIZ_MAX_ROW_SERVER` (e.g., 1_000_000) for this streaming
export, even
though the user chose the regular "Export to .CSV" action.
6. The modified `query_dict` is passed to
`datasource.get_query_str_extended` /
`get_query_str`, generating SQL with the larger limit; `_get_row_limit` in
`StreamingCSVExportCommand` returns `None` (lines 101–108), so
`BaseStreamingCSVExportCommand._execute_query_and_stream` in
`superset/commands/streaming_export/base.py` lines 97–151 streams all rows
up to the
higher SQL limit. The resulting CSV contains far more rows than the chart's
configured
`row_limit`, demonstrating that the ALLOW_FULL_CSV_EXPORT override applies
to all
streaming CSV exports, not only explicit "full CSV" exports.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=4d5ebf995737494b974bbc79fe5ebd15&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=4d5ebf995737494b974bbc79fe5ebd15&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/chart/data/streaming_export_command.py
**Line:** 78:79
**Comment:**
*Incorrect Condition Logic: The row-limit override is applied for every
streaming CSV export whenever the feature flag is enabled, not just for
explicit "full CSV" exports. This makes regular CSV exports unexpectedly bypass
`SQL_MAX_ROW` and can return much larger datasets than requested. Gate this
override on an explicit full-export signal from the request (or only honor the
caller-provided `row_limit`) so normal CSV exports keep their intended cap.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40331&comment_hash=803ae7e776f3e24424e914963dd53d43af0957599987421d8383bd99543251d7&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40331&comment_hash=803ae7e776f3e24424e914963dd53d43af0957599987421d8383bd99543251d7&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]