rusackas commented on code in PR #40859:
URL: https://github.com/apache/superset/pull/40859#discussion_r3375304203


##########
superset/charts/client_processing.py:
##########
@@ -388,9 +389,10 @@ def apply_client_processing(  # noqa: C901
         if query["result_format"] == ChartDataResultFormat.JSON:
             query["data"] = processed_df.to_dict()
         elif query["result_format"] == ChartDataResultFormat.CSV:
-            buf = StringIO()
-            processed_df.to_csv(buf, index=show_default_index)
-            buf.seek(0)
-            query["data"] = buf.getvalue()
+            # Route through the formula-escaping CSV writer, consistent with 
the
+            # other CSV export paths (viz, query context, SQL Lab export).
+            query["data"] = csv.df_to_escaped_csv(
+                processed_df, index=show_default_index
+            )

Review Comment:
   Good catch — confirmed and fixed. `df_to_escaped_csv` was pairing positional 
`enumerate` offsets with the label-based `df.at` accessor, which created 
phantom rows and dropped the escaping on the real rows whenever the DataFrame 
had a non-default index (exactly the flattened pivot_table_v2 index this PR can 
route through). It now iterates by index label via `column.items()`, writing 
each escaped value back to the correct row while preserving the dtype of mixed 
object columns. Added a regression test 
(`test_df_to_escaped_csv_non_default_index`) covering the non-RangeIndex case.



##########
superset/charts/client_processing.py:
##########
@@ -388,9 +389,10 @@ def apply_client_processing(  # noqa: C901
         if query["result_format"] == ChartDataResultFormat.JSON:
             query["data"] = processed_df.to_dict()
         elif query["result_format"] == ChartDataResultFormat.CSV:
-            buf = StringIO()
-            processed_df.to_csv(buf, index=show_default_index)
-            buf.seek(0)
-            query["data"] = buf.getvalue()
+            # Route through the formula-escaping CSV writer, consistent with 
the
+            # other CSV export paths (viz, query context, SQL Lab export).
+            query["data"] = csv.df_to_escaped_csv(
+                processed_df, index=show_default_index
+            )

Review Comment:
   Confirmed and fixed in `superset/utils/csv.py`. Switched from positional 
`enumerate(column.values)` + `df.at[idx, name]` to label-based iteration with 
`column.items()`, so escaping works correctly for arbitrary (non-RangeIndex) 
indexes such as the pivot_table_v2 output. Dtype of mixed object columns is 
preserved since only string cells are reassigned.



##########
tests/unit_tests/charts/test_client_processing.py:
##########
@@ -2152,6 +2152,52 @@ def 
test_apply_client_processing_csv_format_simple_table():
     }
 
 
+def test_apply_client_processing_csv_format_escapes_formula_values():
+    """
+    A value starting with a formula trigger should be escaped in the CSV
+    output, consistent with the other CSV export paths.
+    """

Review Comment:
   Added a regression test for the non-default-index escaping path: 
`test_df_to_escaped_csv_non_default_index` in 
`tests/unit_tests/utils/csv_tests.py` builds a DataFrame with a non-RangeIndex 
(mimicking the flattened pivot index) and a formula-trigger string cell, and 
asserts the value is escaped in place with no duplicated/phantom rows.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to