codeant-ai-for-open-source[bot] commented on code in PR #38616:
URL: https://github.com/apache/superset/pull/38616#discussion_r2929299301


##########
tests/unit_tests/charts/test_client_processing.py:
##########
@@ -2788,3 +2788,45 @@ def 
test_apply_client_processing_csv_format_default_na_behavior():
     assert (
         "Alice," in lines[2]
     )  # Second data row should have empty last_name (NA converted to null)
+
+
+@with_config({"CSV_EXPORT": {"sep": ";", "decimal": ","}})
+def test_apply_client_processing_csv_format_custom_delimiter():
+    """
+    Test that apply_client_processing respects CSV_EXPORT sep and decimal 
config.
+    Without the fix, pd.read_csv() uses default comma separator and fails to 
parse
+    semicolon-delimited CSV correctly, causing HTTP 500 in email reports.
+    """
+    csv_data = "name;value\nfoo;1,5\nbar;2,0"
+
+    result = {
+        "queries": [
+            {
+                "result_format": ChartDataResultFormat.CSV,
+                "data": csv_data,
+            }
+        ]
+    }
+
+    form_data = {
+        "datasource": "1__table",
+        "viz_type": "table",
+        "slice_id": 1,
+        "url_params": {},
+        "metrics": [],
+        "groupby": [],
+        "columns": ["name", "value"],
+        "extra_form_data": {},
+        "force": False,
+        "result_format": "csv",
+        "result_type": "results",
+    }
+
+    processed_result = apply_client_processing(result, form_data)
+
+    output_data = processed_result["queries"][0]["data"]
+    lines = output_data.strip().split("\n")
+    # Should have header + 2 data rows, with correct column parsing
+    assert len(lines) == 3
+    # name and value should be separate columns, not merged into one
+    assert processed_result["queries"][0]["colnames"] == ["name", "value"]

Review Comment:
   **Suggestion:** This test does not actually verify that the configured 
delimiter/decimal are preserved in the generated CSV output, so it can pass 
even when email CSV exports are still emitted with default comma and dot 
formatting. Strengthen the assertions to validate the serialized CSV content 
and numeric typing of the parsed `value` column. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Scheduled CSV emails may ignore configured separator/decimal output.
   - ⚠️ Current unit test misses CSV serialization regressions.
   ```
   </details>
   
   ```suggestion
       output_data = processed_result["queries"][0]["data"]
       lines = output_data.strip().split("\n")
       # Should have header + 2 data rows, with correct column parsing
       assert len(lines) == 3
       # name and value should be separate columns, not merged into one
       assert processed_result["queries"][0]["colnames"] == ["name", "value"]
       # value should be parsed as numeric using decimal="," config
       assert processed_result["queries"][0]["coltypes"] == [
           GenericDataType.STRING,
           GenericDataType.NUMERIC,
       ]
       # serialized CSV should honor configured separator/decimal for email 
exports
       assert lines[0] == "name;value"
       assert "foo;1,5" in lines[1]
       assert "bar;2,0" in lines[2]
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure a report with non-default CSV settings
   (`CSV_EXPORT={"sep":";","decimal":","}`), then trigger scheduled CSV 
generation through
   `ReportScheduleStateMachine._get_csv_data()` in
   `superset/commands/report/execute.py:439-443`, which calls 
`ChartDataRestApi.get_data`
   with `type=post_processed` at `execute.py:226-230`.
   
   2. In chart response handling, `_send_chart_response()` applies 
post-processing when
   `result_type == POST_PROCESSED` (`superset/charts/data/api.py:404-405`), 
invoking
   `apply_client_processing()`.
   
   3. In `apply_client_processing()`, CSV is parsed with configured 
`sep`/`decimal`
   (`superset/charts/client_processing.py:348-356`) but serialized back using 
default
   `processed_df.to_csv(...)` without config (`client_processing.py:397`), 
yielding comma/dot
   formatted output.
   
   4. Run the current unit test 
`test_apply_client_processing_csv_format_custom_delimiter`
   (`tests/unit_tests/charts/test_client_processing.py:2794-2832`): it only 
asserts row count
   and `colnames` (`2827-2832`), so it still passes even when output CSV 
formatting is wrong.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/charts/test_client_processing.py
   **Line:** 2827:2832
   **Comment:**
        *Logic Error: This test does not actually verify that the configured 
delimiter/decimal are preserved in the generated CSV output, so it can pass 
even when email CSV exports are still emitted with default comma and dot 
formatting. Strengthen the assertions to validate the serialized CSV content 
and numeric typing of the parsed `value` column.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38616&comment_hash=2171f418a011e134c41f6f12c424bdcfde87be2166f47790c63e9b2a0b3dad05&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38616&comment_hash=2171f418a011e134c41f6f12c424bdcfde87be2166f47790c63e9b2a0b3dad05&reaction=dislike'>👎</a>



##########
superset/charts/client_processing.py:
##########
@@ -345,10 +345,15 @@ def apply_client_processing(  # noqa: C901
             # reports to avoid unwanted conversions
             # This allows users to control which values should be treated as 
null/NA
             na_values = current_app.config["REPORTS_CSV_NA_NAMES"]
+            csv_export_config = current_app.config.get("CSV_EXPORT", {})

Review Comment:
   **Suggestion:** The CSV parser now respects configured delimiter/decimal, 
but the CSV writer still uses pandas defaults, so processed email report output 
is re-serialized with `,` and `.` instead of the configured format. This breaks 
the intended contract for localized CSV exports and can corrupt round-tripping 
for semicolon/comma-decimal setups. Reuse the same `sep` and `decimal` values 
when calling `to_csv`. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Scheduled CSV reports ignore configured output separators.
   - ⚠️ Locale decimal formatting changes in emailed CSV files.
   - ⚠️ Behavior diverges from GUI CSV export formatting.
   ```
   </details>
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to