codeant-ai-for-open-source[bot] commented on code in PR #40859:
URL: https://github.com/apache/superset/pull/40859#discussion_r3375032079
##########
superset/charts/client_processing.py:
##########
@@ -388,9 +389,10 @@ def apply_client_processing( # noqa: C901
if query["result_format"] == ChartDataResultFormat.JSON:
query["data"] = processed_df.to_dict()
elif query["result_format"] == ChartDataResultFormat.CSV:
- buf = StringIO()
- processed_df.to_csv(buf, index=show_default_index)
- buf.seek(0)
- query["data"] = buf.getvalue()
+ # Route through the formula-escaping CSV writer, consistent with
the
+ # other CSV export paths (viz, query context, SQL Lab export).
+ query["data"] = csv.df_to_escaped_csv(
+ processed_df, index=show_default_index
+ )
Review Comment:
**Suggestion:** This new call can corrupt CSV output for non-default indexes
(common in pivot/table post-processing). `df_to_escaped_csv()` mutates object
columns using `df.at[idx, ...]` with `idx` from `enumerate`, which assumes a
`RangeIndex`; when the DataFrame index is labels/tuples, it writes to wrong/new
rows and can duplicate or misalign data. Keep the index-safe behavior by fixing
the escaping path to use label-safe iteration (or adjust input before calling)
so rows are not rewritten under incorrect index keys. [api mismatch]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ Pivot_table_v2 CSV exports with string metrics misalign rows.
- ❌ Post-processed CSV reports can contain duplicated or shifted values.
- ⚠️ Report recipients may trust incorrect pivot aggregations.
- ⚠️ Inconsistency between UI pivot view and downloaded CSV.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Create or use a Pivot Table v2 chart configured to request post-processed
CSV data
(result_type="post_processed", result_format="csv") via the ChartData REST
API, which is
handled by `ChartDataRestApi.get_data` in
`superset/charts/data/api.py:81-181`. Ensure the
metric uses the "List Unique Values" aggregator so the pivot output contains
string
values.
2. The query context execution returns a FULL result payload; in
`_get_data_response`
(`superset/charts/data/api.py:64-96`) the result is passed to
`_send_chart_response`,
which sees `result_type == ChartDataResultType.POST_PROCESSED` and calls
`apply_client_processing(result, form_data, datasource)` at
`superset/charts/data/api.py:232-234`.
3. Inside `apply_client_processing`
(`superset/charts/client_processing.py:314-366`), the
CSV query branch parses the raw CSV into a DataFrame (`pd.read_csv` at lines
45-54) and
runs the pivot post-processor `pivot_table_v2`
(`superset/charts/client_processing.py:259-280`). `pivot_df` converts the
index to a
`MultiIndex` of tuples of labels (e.g. `('boy', 'Edward')`) at lines
151-155, and then
`apply_client_processing` flattens this multi-level index into string labels
at lines
81-87, so `processed_df.index` contains non-default string labels, not 0..N
integers.
4. For CSV result_format, the new code calls
`csv.df_to_escaped_csv(processed_df,
index=show_default_index)` at `superset/charts/client_processing.py:92-97`.
In
`df_to_escaped_csv` (`superset/utils/csv.py:67-81`), each object-typed
column (string
metric values from "List Unique Values") is iterated with
`enumerate(column.values)` and
mutated using `df.at[idx, name]` where `idx` is the positional counter
0..N-1. Because
`processed_df.index` now consists of string labels like `"('boy',
'Edward')"` rather than
integer labels 0..N-1, these `.at[idx, name]` writes target non-existent or
wrong index
labels, leading pandas to write values under incorrect/new index keys. The
resulting
DataFrame, which is then serialized via `df.to_csv(...)`, yields a CSV whose
rows are
duplicated or misaligned compared to the correct pivot output shown in the
UI.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=593ed19bdae54194a6586cc3d2f62b3d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=593ed19bdae54194a6586cc3d2f62b3d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/charts/client_processing.py
**Line:** 394:396
**Comment:**
*Api Mismatch: This new call can corrupt CSV output for non-default
indexes (common in pivot/table post-processing). `df_to_escaped_csv()` mutates
object columns using `df.at[idx, ...]` with `idx` from `enumerate`, which
assumes a `RangeIndex`; when the DataFrame index is labels/tuples, it writes to
wrong/new rows and can duplicate or misalign data. Keep the index-safe behavior
by fixing the escaping path to use label-safe iteration (or adjust input before
calling) so rows are not rewritten under incorrect index keys.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40859&comment_hash=3a7c7e3f1f31c9e301ca87c01fba7b85abb5b4bcb93eece9dad80318120bf019&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40859&comment_hash=3a7c7e3f1f31c9e301ca87c01fba7b85abb5b4bcb93eece9dad80318120bf019&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]