codeant-ai-for-open-source[bot] commented on code in PR #40859:
URL: https://github.com/apache/superset/pull/40859#discussion_r3375032079


##########
superset/charts/client_processing.py:
##########
@@ -388,9 +389,10 @@ def apply_client_processing(  # noqa: C901
         if query["result_format"] == ChartDataResultFormat.JSON:
             query["data"] = processed_df.to_dict()
         elif query["result_format"] == ChartDataResultFormat.CSV:
-            buf = StringIO()
-            processed_df.to_csv(buf, index=show_default_index)
-            buf.seek(0)
-            query["data"] = buf.getvalue()
+            # Route through the formula-escaping CSV writer, consistent with 
the
+            # other CSV export paths (viz, query context, SQL Lab export).
+            query["data"] = csv.df_to_escaped_csv(
+                processed_df, index=show_default_index
+            )

Review Comment:
   **Suggestion:** This new call can corrupt CSV output for non-default indexes 
(common in pivot/table post-processing). `df_to_escaped_csv()` mutates object 
columns using `df.at[idx, ...]` with `idx` from `enumerate`, which assumes a 
`RangeIndex`; when the DataFrame index is labels/tuples, it writes to wrong/new 
rows and can duplicate or misalign data. Keep the index-safe behavior by fixing 
the escaping path to use label-safe iteration (or adjust input before calling) 
so rows are not rewritten under incorrect index keys. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Pivot_table_v2 CSV exports with string metrics misalign rows.
   - ❌ Post-processed CSV reports can contain duplicated or shifted values.
   - ⚠️ Report recipients may trust incorrect pivot aggregations.
   - ⚠️ Inconsistency between UI pivot view and downloaded CSV.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Create or use a Pivot Table v2 chart configured to request post-processed 
CSV data
   (result_type="post_processed", result_format="csv") via the ChartData REST 
API, which is
   handled by `ChartDataRestApi.get_data` in 
`superset/charts/data/api.py:81-181`. Ensure the
   metric uses the "List Unique Values" aggregator so the pivot output contains 
string
   values.
   
   2. The query context execution returns a FULL result payload; in 
`_get_data_response`
   (`superset/charts/data/api.py:64-96`) the result is passed to 
`_send_chart_response`,
   which sees `result_type == ChartDataResultType.POST_PROCESSED` and calls
   `apply_client_processing(result, form_data, datasource)` at
   `superset/charts/data/api.py:232-234`.
   
   3. Inside `apply_client_processing` 
(`superset/charts/client_processing.py:314-366`), the
   CSV query branch parses the raw CSV into a DataFrame (`pd.read_csv` at lines 
45-54) and
   runs the pivot post-processor `pivot_table_v2`
   (`superset/charts/client_processing.py:259-280`). `pivot_df` converts the 
index to a
   `MultiIndex` of tuples of labels (e.g. `('boy', 'Edward')`) at lines 
151-155, and then
   `apply_client_processing` flattens this multi-level index into string labels 
at lines
   81-87, so `processed_df.index` contains non-default string labels, not 0..N 
integers.
   
   4. For CSV result_format, the new code calls 
`csv.df_to_escaped_csv(processed_df,
   index=show_default_index)` at `superset/charts/client_processing.py:92-97`. 
In
   `df_to_escaped_csv` (`superset/utils/csv.py:67-81`), each object-typed 
column (string
   metric values from "List Unique Values") is iterated with 
`enumerate(column.values)` and
   mutated using `df.at[idx, name]` where `idx` is the positional counter 
0..N-1. Because
   `processed_df.index` now consists of string labels like `"('boy', 
'Edward')"` rather than
   integer labels 0..N-1, these `.at[idx, name]` writes target non-existent or 
wrong index
   labels, leading pandas to write values under incorrect/new index keys. The 
resulting
   DataFrame, which is then serialized via `df.to_csv(...)`, yields a CSV whose 
rows are
   duplicated or misaligned compared to the correct pivot output shown in the 
UI.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=593ed19bdae54194a6586cc3d2f62b3d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=593ed19bdae54194a6586cc3d2f62b3d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/charts/client_processing.py
   **Line:** 394:396
   **Comment:**
        *Api Mismatch: This new call can corrupt CSV output for non-default 
indexes (common in pivot/table post-processing). `df_to_escaped_csv()` mutates 
object columns using `df.at[idx, ...]` with `idx` from `enumerate`, which 
assumes a `RangeIndex`; when the DataFrame index is labels/tuples, it writes to 
wrong/new rows and can duplicate or misalign data. Keep the index-safe behavior 
by fixing the escaping path to use label-safe iteration (or adjust input before 
calling) so rows are not rewritten under incorrect index keys.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40859&comment_hash=3a7c7e3f1f31c9e301ca87c01fba7b85abb5b4bcb93eece9dad80318120bf019&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40859&comment_hash=3a7c7e3f1f31c9e301ca87c01fba7b85abb5b4bcb93eece9dad80318120bf019&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to