codeant-ai-for-open-source[bot] commented on code in PR #38170:
URL: https://github.com/apache/superset/pull/38170#discussion_r2838475431


##########
superset/commands/streaming_export/base.py:
##########
@@ -107,16 +107,49 @@ def _write_csv_header(
         buffer.truncate()
         return header_data, total_bytes
 
+    def _format_row_values(
+        self, row: tuple[Any, ...], decimal_separator: str | None
+    ) -> list[Any]:
+        """
+        Format row values, applying custom decimal separator if specified.
+
+        Args:
+            row: Database row as a tuple
+            decimal_separator: Custom decimal separator (e.g., ",") or None
+
+        Returns:
+            List of formatted values
+        """
+        if not decimal_separator or decimal_separator == ".":
+            return list(row)
+
+        formatted = []
+        for value in row:
+            if isinstance(value, float):
+                # Format float with custom decimal separator

Review Comment:
   **Suggestion:** The decimal separator configuration is only applied to 
values of type `float`, so if the database driver returns numeric columns as 
other numeric types like `decimal.Decimal` (which is common for NUMERIC/DECIMAL 
columns), those values will bypass `_format_row_values` and still use a dot as 
decimal separator, causing streaming CSV exports to ignore the configured 
decimal separator for many numeric columns; broaden the type check to cover 
generic numeric types so all numeric values respect the `decimal` setting. 
[logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Streaming CSV from SQL Lab misformats DECIMAL numeric columns.
   - ❌ Chart data streaming export misformats DECIMAL metrics with comma.
   - ⚠️ Locale-specific tools misparse numbers due to dot decimal.
   ```
   </details>
   
   ```suggestion
           from numbers import Real
   
           formatted: list[Any] = []
           for value in row:
               if isinstance(value, Real):
                   # Format numeric values with custom decimal separator
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure Superset to use a custom decimal separator by setting 
`CSV_EXPORT =
   {"encoding": "utf-8", "sep": ";", "decimal": ","}` in `superset_config.py` 
(this
   configuration is read in 
`superset/commands/streaming_export/base.py:195-197` when
   `_execute_query_and_stream()` obtains `decimal_separator`).
   
   2. In a supported database where SQLAlchemy maps NUMERIC/DECIMAL columns to
   `decimal.Decimal` (standard behavior for many engines), create or use a 
table with a
   DECIMAL/NUMERIC column and insert a value like `1234.56`; query it either 
via a chart or
   SQL Lab so that Superset executes a SELECT returning that DECIMAL column.
   
   3. Trigger a streaming CSV export for that query: for charts, the export 
path uses
   `StreamingCSVExportCommand` from
   `superset/commands/chart/data/streaming_export_command.py:29` which 
subclasses
   `BaseStreamingCSVExportCommand`; for SQL Lab, the API in 
`superset/sqllab/api.py:405`
   constructs `StreamingSqlResultExportCommand(client_id, chunk_size)` (per 
Grep output)
   which also subclasses `BaseStreamingCSVExportCommand`.
   
   4. During the export, 
`BaseStreamingCSVExportCommand._execute_query_and_stream()` in
   `superset/commands/streaming_export/base.py:187-243` streams rows; each 
batch is processed
   by `_process_rows()` at lines 135-185, which calls `_format_row_values(row,
   decimal_separator)` at lines 164-165 with `decimal_separator=","`. Because 
the current
   implementation at lines 110-133 only checks `isinstance(value, float)`, any
   `decimal.Decimal` values in the row are treated as non-floats and appended 
unchanged, so
   `csv.writer` writes strings like `"1234.56"` with a dot instead of 
`"1234,56"`, ignoring
   the configured decimal separator for these numeric columns.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/streaming_export/base.py
   **Line:** 126:129
   **Comment:**
        *Logic Error: The decimal separator configuration is only applied to 
values of type `float`, so if the database driver returns numeric columns as 
other numeric types like `decimal.Decimal` (which is common for NUMERIC/DECIMAL 
columns), those values will bypass `_format_row_values` and still use a dot as 
decimal separator, causing streaming CSV exports to ignore the configured 
decimal separator for many numeric columns; broaden the type check to cover 
generic numeric types so all numeric values respect the `decimal` setting.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38170&comment_hash=2e2858979147af8b2efd48b750b42ddf7695dbdac971f93fa5f21af941bd037d&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38170&comment_hash=2e2858979147af8b2efd48b750b42ddf7695dbdac971f93fa5f21af941bd037d&reaction=dislike'>👎</a>



##########
tests/unit_tests/charts/test_client_processing.py:
##########
@@ -2788,3 +2788,102 @@ def 
test_apply_client_processing_csv_format_default_na_behavior():
     assert (
         "Alice," in lines[2]
     )  # Second data row should have empty last_name (NA converted to null)
+
+
+@with_config({"CSV_EXPORT": {"sep": ";", "decimal": ","}})
+def test_apply_client_processing_csv_format_custom_separator():
+    """
+    Test that apply_client_processing respects CSV_EXPORT config
+    for custom separator and decimal character.
+
+    This is a regression test for GitHub issue #32371.
+    """
+    # CSV data with numeric values
+    csv_data = "name,value\nAlice,1.5\nBob,2.75"
+
+    result = {
+        "queries": [
+            {
+                "result_format": ChartDataResultFormat.CSV,
+                "data": csv_data,
+            }
+        ]
+    }
+
+    form_data = {
+        "datasource": "1__table",
+        "viz_type": "table",
+        "slice_id": 1,
+        "url_params": {},
+        "metrics": [],
+        "groupby": [],
+        "columns": ["name", "value"],
+        "extra_form_data": {},
+        "force": False,
+        "result_format": "csv",
+        "result_type": "results",
+    }
+
+    processed_result = apply_client_processing(result, form_data)
+
+    output_data = processed_result["queries"][0]["data"]
+    lines = output_data.strip().split("\n")
+
+    # With sep=";", columns should be separated by semicolon
+    assert "name;value" in lines[0]
+    # With decimal=",", decimal values should use comma as separator
+    assert "Alice;1,5" in lines[1] or "Alice;1.5" in lines[1]
+    assert "Bob;2,75" in lines[2] or "Bob;2.75" in lines[2]
+
+
+@with_config({"CSV_EXPORT": {"sep": ";", "decimal": ","}})
+def test_apply_client_processing_csv_pivot_table_custom_separator():
+    """
+    Test that apply_client_processing respects CSV_EXPORT config
+    for pivot table exports with custom separator and decimal character.
+
+    This is a regression test for GitHub issue #32371 - specifically for
+    pivoted CSV exports which were not respecting the CSV_EXPORT config.
+    """
+    # CSV data with a numeric metric
+    csv_data = "COUNT(metric)\n1234.56"
+
+    result = {
+        "queries": [
+            {
+                "result_format": ChartDataResultFormat.CSV,
+                "data": csv_data,
+            }
+        ]
+    }
+
+    form_data = {
+        "datasource": "1__table",
+        "viz_type": "pivot_table_v2",
+        "slice_id": 1,
+        "url_params": {},
+        "groupbyColumns": [],
+        "groupbyRows": [],
+        "metrics": [
+            {
+                "aggregate": "COUNT",
+                "column": {"column_name": "metric"},
+                "expressionType": "SIMPLE",
+                "label": "COUNT(metric)",
+            }
+        ],
+        "metricsLayout": "COLUMNS",
+        "aggregateFunction": "Sum",
+        "extra_form_data": {},
+        "force": False,
+        "result_format": "csv",
+        "result_type": "results",
+    }
+
+    processed_result = apply_client_processing(result, form_data)
+
+    output_data = processed_result["queries"][0]["data"]
+
+    # The output should use semicolon as separator and comma as decimal
+    # After pivoting, the format should reflect CSV_EXPORT settings
+    assert ";" in output_data or "," in output_data

Review Comment:
   **Suggestion:** The pivot-table CSV regression test only asserts that the 
output contains either a semicolon or comma, which will usually be true 
regardless of whether the CSV_EXPORT separator/decimal configuration is 
honored, so it does not actually verify the intended behavior. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Pivot-table CSV regression test barely checks separator behavior.
   - ⚠️ Pivot CSV_EXPORT settings regressions could slip into production.
   ```
   </details>
   
   ```suggestion
       lines = output_data.strip().split("\n")
   
       # The output should use semicolon as separator and comma as decimal
       # After pivoting, the format should reflect CSV_EXPORT settings
       assert lines[0] == ";COUNT(metric)"
       assert "Total (Sum);1234,56" in lines[1]
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Open `test_apply_client_processing_csv_pivot_table_custom_separator()` in
   `tests/unit_tests/charts/test_client_processing.py:2839-2889`, which sets 
`CSV_EXPORT =
   {"sep": ";", "decimal": ","}` and calls `apply_client_processing()` with
   `viz_type="pivot_table_v2"` and `result_format="csv"`.
   
   2. Inspect `apply_client_processing()` in 
`superset/charts/client_processing.py:312-398`,
   where pivoted CSV results are produced by `pivot_table_v2()` and written via
   `processed_df.to_csv(buf, index=show_default_index, **csv_export_config)` at 
lines 391-395
   using the `CSV_EXPORT` configuration.
   
   3. Introduce a regression by modifying the `to_csv` call in 
`apply_client_processing()`
   (lines 391-395) so that it ignores `CSV_EXPORT` (e.g., remove 
`**csv_export_config` or
   hard-code default `sep=","`, `decimal="."`), causing the pivoted CSV to use 
default
   separators and decimals.
   
   4. Run `pytest
   
tests/unit_tests/charts/test_client_processing.py::test_apply_client_processing_csv_pivot_table_custom_separator`;
   observe that the test still passes, because the only assertion at
   `tests/unit_tests/charts/test_client_processing.py:2883-2889` merely checks 
that the
   output string contains either `";"` or `","`, which is true for any CSV 
output regardless
   of whether `CSV_EXPORT` was honored.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/charts/test_client_processing.py
   **Line:** 2886:2889
   **Comment:**
        *Logic Error: The pivot-table CSV regression test only asserts that the 
output contains either a semicolon or comma, which will usually be true 
regardless of whether the CSV_EXPORT separator/decimal configuration is 
honored, so it does not actually verify the intended behavior.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38170&comment_hash=56c6b6faac0df1597dd89738e07680475a30b48ca217f3e277ae70db03702f7b&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38170&comment_hash=56c6b6faac0df1597dd89738e07680475a30b48ca217f3e277ae70db03702f7b&reaction=dislike'>👎</a>



##########
tests/unit_tests/charts/test_client_processing.py:
##########
@@ -2788,3 +2788,102 @@ def 
test_apply_client_processing_csv_format_default_na_behavior():
     assert (
         "Alice," in lines[2]
     )  # Second data row should have empty last_name (NA converted to null)
+
+
+@with_config({"CSV_EXPORT": {"sep": ";", "decimal": ","}})
+def test_apply_client_processing_csv_format_custom_separator():
+    """
+    Test that apply_client_processing respects CSV_EXPORT config
+    for custom separator and decimal character.
+
+    This is a regression test for GitHub issue #32371.
+    """
+    # CSV data with numeric values
+    csv_data = "name,value\nAlice,1.5\nBob,2.75"
+
+    result = {
+        "queries": [
+            {
+                "result_format": ChartDataResultFormat.CSV,
+                "data": csv_data,
+            }
+        ]
+    }
+
+    form_data = {
+        "datasource": "1__table",
+        "viz_type": "table",
+        "slice_id": 1,
+        "url_params": {},
+        "metrics": [],
+        "groupby": [],
+        "columns": ["name", "value"],
+        "extra_form_data": {},
+        "force": False,
+        "result_format": "csv",
+        "result_type": "results",
+    }
+
+    processed_result = apply_client_processing(result, form_data)
+
+    output_data = processed_result["queries"][0]["data"]
+    lines = output_data.strip().split("\n")
+
+    # With sep=";", columns should be separated by semicolon
+    assert "name;value" in lines[0]
+    # With decimal=",", decimal values should use comma as separator
+    assert "Alice;1,5" in lines[1] or "Alice;1.5" in lines[1]
+    assert "Bob;2,75" in lines[2] or "Bob;2.75" in lines[2]

Review Comment:
   **Suggestion:** The regression test for custom CSV separator and decimal 
allows both comma and dot decimals, so it will still pass even if the 
CSV_EXPORT decimal setting is ignored, failing to detect regressions in numeric 
formatting. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Unit test for CSV decimal formatting may silently pass.
   - ⚠️ Table CSV exports with custom decimal can regress unnoticed.
   ```
   </details>
   
   ```suggestion
       assert lines[0] == "name;value"
       # With decimal=",", decimal values should use comma as separator
       assert "Alice;1,5" in lines[1]
       assert "Bob;2,75" in lines[2]
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Open `test_apply_client_processing_csv_format_custom_separator()` in
   `tests/unit_tests/charts/test_client_processing.py:2793-2827` (added in this 
PR) which
   uses `@with_config({"CSV_EXPORT": {"sep": ";", "decimal": ","}})` and calls
   `apply_client_processing()`.
   
   2. Note that `apply_client_processing()` in 
`superset/charts/client_processing.py:312-398`
   formats CSV exports via `processed_df.to_csv(...)` using `csv_export_config =
   current_app.config.get("CSV_EXPORT", {})` at lines 391-395.
   
   3. Introduce a regression by editing `processed_df.to_csv(buf, 
index=show_default_index,
   **csv_export_config)` at `superset/charts/client_processing.py:393-394` so 
that it no
   longer passes the `decimal` option (e.g., only passes `sep`), causing 
numeric values in
   the exported table CSV to retain a `.` decimal separator.
   
   4. Run `pytest
   
tests/unit_tests/charts/test_client_processing.py::test_apply_client_processing_csv_format_custom_separator`;
   observe the test still passes because the assertions at
   `tests/unit_tests/charts/test_client_processing.py:2832-2836` accept both 
`"Alice;1,5"`
   and `"Alice;1.5"` (and likewise for Bob), so the regression in honoring
   `CSV_EXPORT["decimal"]` is not detected.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/charts/test_client_processing.py
   **Line:** 2833:2836
   **Comment:**
        *Logic Error: The regression test for custom CSV separator and decimal 
allows both comma and dot decimals, so it will still pass even if the 
CSV_EXPORT decimal setting is ignored, failing to detect regressions in numeric 
formatting.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38170&comment_hash=5c5b778b282a68d4599918e0d587c90835840011f871e3c76d19ea6897a53b3e&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38170&comment_hash=5c5b778b282a68d4599918e0d587c90835840011f871e3c76d19ea6897a53b3e&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to