codeant-ai-for-open-source[bot] commented on code in PR #38170:
URL: https://github.com/apache/superset/pull/38170#discussion_r2838475431
##########
superset/commands/streaming_export/base.py:
##########
@@ -107,16 +107,49 @@ def _write_csv_header(
buffer.truncate()
return header_data, total_bytes
+ def _format_row_values(
+ self, row: tuple[Any, ...], decimal_separator: str | None
+ ) -> list[Any]:
+ """
+ Format row values, applying custom decimal separator if specified.
+
+ Args:
+ row: Database row as a tuple
+ decimal_separator: Custom decimal separator (e.g., ",") or None
+
+ Returns:
+ List of formatted values
+ """
+ if not decimal_separator or decimal_separator == ".":
+ return list(row)
+
+ formatted = []
+ for value in row:
+ if isinstance(value, float):
+ # Format float with custom decimal separator
Review Comment:
**Suggestion:** The decimal separator configuration is only applied to
values of type `float`, so if the database driver returns numeric columns as
other numeric types like `decimal.Decimal` (which is common for NUMERIC/DECIMAL
columns), those values will bypass `_format_row_values` and still use a dot as
decimal separator, causing streaming CSV exports to ignore the configured
decimal separator for many numeric columns; broaden the type check to cover
generic numeric types so all numeric values respect the `decimal` setting.
[logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Streaming CSV from SQL Lab misformats DECIMAL numeric columns.
- ❌ Chart data streaming export misformats DECIMAL metrics with comma.
- ⚠️ Locale-specific tools misparse numbers due to dot decimal.
```
</details>
```suggestion
from numbers import Real
formatted: list[Any] = []
for value in row:
if isinstance(value, Real):
# Format numeric values with custom decimal separator
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Configure Superset to use a custom decimal separator by setting
`CSV_EXPORT =
{"encoding": "utf-8", "sep": ";", "decimal": ","}` in `superset_config.py`
(this
configuration is read in
`superset/commands/streaming_export/base.py:195-197` when
`_execute_query_and_stream()` obtains `decimal_separator`).
2. In a supported database where SQLAlchemy maps NUMERIC/DECIMAL columns to
`decimal.Decimal` (standard behavior for many engines), create or use a
table with a
DECIMAL/NUMERIC column and insert a value like `1234.56`; query it either
via a chart or
SQL Lab so that Superset executes a SELECT returning that DECIMAL column.
3. Trigger a streaming CSV export for that query: for charts, the export
path uses
`StreamingCSVExportCommand` from
`superset/commands/chart/data/streaming_export_command.py:29` which
subclasses
`BaseStreamingCSVExportCommand`; for SQL Lab, the API in
`superset/sqllab/api.py:405`
constructs `StreamingSqlResultExportCommand(client_id, chunk_size)` (per
Grep output)
which also subclasses `BaseStreamingCSVExportCommand`.
4. During the export,
`BaseStreamingCSVExportCommand._execute_query_and_stream()` in
`superset/commands/streaming_export/base.py:187-243` streams rows; each
batch is processed
by `_process_rows()` at lines 135-185, which calls `_format_row_values(row,
decimal_separator)` at lines 164-165 with `decimal_separator=","`. Because
the current
implementation at lines 110-133 only checks `isinstance(value, float)`, any
`decimal.Decimal` values in the row are treated as non-floats and appended
unchanged, so
`csv.writer` writes strings like `"1234.56"` with a dot instead of
`"1234,56"`, ignoring
the configured decimal separator for these numeric columns.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/streaming_export/base.py
**Line:** 126:129
**Comment:**
*Logic Error: The decimal separator configuration is only applied to
values of type `float`, so if the database driver returns numeric columns as
other numeric types like `decimal.Decimal` (which is common for NUMERIC/DECIMAL
columns), those values will bypass `_format_row_values` and still use a dot as
decimal separator, causing streaming CSV exports to ignore the configured
decimal separator for many numeric columns; broaden the type check to cover
generic numeric types so all numeric values respect the `decimal` setting.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38170&comment_hash=2e2858979147af8b2efd48b750b42ddf7695dbdac971f93fa5f21af941bd037d&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38170&comment_hash=2e2858979147af8b2efd48b750b42ddf7695dbdac971f93fa5f21af941bd037d&reaction=dislike'>👎</a>
##########
tests/unit_tests/charts/test_client_processing.py:
##########
@@ -2788,3 +2788,102 @@ def
test_apply_client_processing_csv_format_default_na_behavior():
assert (
"Alice," in lines[2]
) # Second data row should have empty last_name (NA converted to null)
+
+
+@with_config({"CSV_EXPORT": {"sep": ";", "decimal": ","}})
+def test_apply_client_processing_csv_format_custom_separator():
+ """
+ Test that apply_client_processing respects CSV_EXPORT config
+ for custom separator and decimal character.
+
+ This is a regression test for GitHub issue #32371.
+ """
+ # CSV data with numeric values
+ csv_data = "name,value\nAlice,1.5\nBob,2.75"
+
+ result = {
+ "queries": [
+ {
+ "result_format": ChartDataResultFormat.CSV,
+ "data": csv_data,
+ }
+ ]
+ }
+
+ form_data = {
+ "datasource": "1__table",
+ "viz_type": "table",
+ "slice_id": 1,
+ "url_params": {},
+ "metrics": [],
+ "groupby": [],
+ "columns": ["name", "value"],
+ "extra_form_data": {},
+ "force": False,
+ "result_format": "csv",
+ "result_type": "results",
+ }
+
+ processed_result = apply_client_processing(result, form_data)
+
+ output_data = processed_result["queries"][0]["data"]
+ lines = output_data.strip().split("\n")
+
+ # With sep=";", columns should be separated by semicolon
+ assert "name;value" in lines[0]
+ # With decimal=",", decimal values should use comma as separator
+ assert "Alice;1,5" in lines[1] or "Alice;1.5" in lines[1]
+ assert "Bob;2,75" in lines[2] or "Bob;2.75" in lines[2]
+
+
+@with_config({"CSV_EXPORT": {"sep": ";", "decimal": ","}})
+def test_apply_client_processing_csv_pivot_table_custom_separator():
+ """
+ Test that apply_client_processing respects CSV_EXPORT config
+ for pivot table exports with custom separator and decimal character.
+
+ This is a regression test for GitHub issue #32371 - specifically for
+ pivoted CSV exports which were not respecting the CSV_EXPORT config.
+ """
+ # CSV data with a numeric metric
+ csv_data = "COUNT(metric)\n1234.56"
+
+ result = {
+ "queries": [
+ {
+ "result_format": ChartDataResultFormat.CSV,
+ "data": csv_data,
+ }
+ ]
+ }
+
+ form_data = {
+ "datasource": "1__table",
+ "viz_type": "pivot_table_v2",
+ "slice_id": 1,
+ "url_params": {},
+ "groupbyColumns": [],
+ "groupbyRows": [],
+ "metrics": [
+ {
+ "aggregate": "COUNT",
+ "column": {"column_name": "metric"},
+ "expressionType": "SIMPLE",
+ "label": "COUNT(metric)",
+ }
+ ],
+ "metricsLayout": "COLUMNS",
+ "aggregateFunction": "Sum",
+ "extra_form_data": {},
+ "force": False,
+ "result_format": "csv",
+ "result_type": "results",
+ }
+
+ processed_result = apply_client_processing(result, form_data)
+
+ output_data = processed_result["queries"][0]["data"]
+
+ # The output should use semicolon as separator and comma as decimal
+ # After pivoting, the format should reflect CSV_EXPORT settings
+ assert ";" in output_data or "," in output_data
Review Comment:
**Suggestion:** The pivot-table CSV regression test only asserts that the
output contains either a semicolon or comma, which will usually be true
regardless of whether the CSV_EXPORT separator/decimal configuration is
honored, so it does not actually verify the intended behavior. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Pivot-table CSV regression test barely checks separator behavior.
- ⚠️ Pivot CSV_EXPORT settings regressions could slip into production.
```
</details>
```suggestion
lines = output_data.strip().split("\n")
# The output should use semicolon as separator and comma as decimal
# After pivoting, the format should reflect CSV_EXPORT settings
assert lines[0] == ";COUNT(metric)"
assert "Total (Sum);1234,56" in lines[1]
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Open `test_apply_client_processing_csv_pivot_table_custom_separator()` in
`tests/unit_tests/charts/test_client_processing.py:2839-2889`, which sets
`CSV_EXPORT =
{"sep": ";", "decimal": ","}` and calls `apply_client_processing()` with
`viz_type="pivot_table_v2"` and `result_format="csv"`.
2. Inspect `apply_client_processing()` in
`superset/charts/client_processing.py:312-398`,
where pivoted CSV results are produced by `pivot_table_v2()` and written via
`processed_df.to_csv(buf, index=show_default_index, **csv_export_config)` at
lines 391-395
using the `CSV_EXPORT` configuration.
3. Introduce a regression by modifying the `to_csv` call in
`apply_client_processing()`
(lines 391-395) so that it ignores `CSV_EXPORT` (e.g., remove
`**csv_export_config` or
hard-code default `sep=","`, `decimal="."`), causing the pivoted CSV to use
default
separators and decimals.
4. Run `pytest
tests/unit_tests/charts/test_client_processing.py::test_apply_client_processing_csv_pivot_table_custom_separator`;
observe that the test still passes, because the only assertion at
`tests/unit_tests/charts/test_client_processing.py:2883-2889` merely checks
that the
output string contains either `";"` or `","`, which is true for any CSV
output regardless
of whether `CSV_EXPORT` was honored.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/unit_tests/charts/test_client_processing.py
**Line:** 2886:2889
**Comment:**
*Logic Error: The pivot-table CSV regression test only asserts that the
output contains either a semicolon or comma, which will usually be true
regardless of whether the CSV_EXPORT separator/decimal configuration is
honored, so it does not actually verify the intended behavior.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38170&comment_hash=56c6b6faac0df1597dd89738e07680475a30b48ca217f3e277ae70db03702f7b&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38170&comment_hash=56c6b6faac0df1597dd89738e07680475a30b48ca217f3e277ae70db03702f7b&reaction=dislike'>👎</a>
##########
tests/unit_tests/charts/test_client_processing.py:
##########
@@ -2788,3 +2788,102 @@ def
test_apply_client_processing_csv_format_default_na_behavior():
assert (
"Alice," in lines[2]
) # Second data row should have empty last_name (NA converted to null)
+
+
+@with_config({"CSV_EXPORT": {"sep": ";", "decimal": ","}})
+def test_apply_client_processing_csv_format_custom_separator():
+ """
+ Test that apply_client_processing respects CSV_EXPORT config
+ for custom separator and decimal character.
+
+ This is a regression test for GitHub issue #32371.
+ """
+ # CSV data with numeric values
+ csv_data = "name,value\nAlice,1.5\nBob,2.75"
+
+ result = {
+ "queries": [
+ {
+ "result_format": ChartDataResultFormat.CSV,
+ "data": csv_data,
+ }
+ ]
+ }
+
+ form_data = {
+ "datasource": "1__table",
+ "viz_type": "table",
+ "slice_id": 1,
+ "url_params": {},
+ "metrics": [],
+ "groupby": [],
+ "columns": ["name", "value"],
+ "extra_form_data": {},
+ "force": False,
+ "result_format": "csv",
+ "result_type": "results",
+ }
+
+ processed_result = apply_client_processing(result, form_data)
+
+ output_data = processed_result["queries"][0]["data"]
+ lines = output_data.strip().split("\n")
+
+ # With sep=";", columns should be separated by semicolon
+ assert "name;value" in lines[0]
+ # With decimal=",", decimal values should use comma as separator
+ assert "Alice;1,5" in lines[1] or "Alice;1.5" in lines[1]
+ assert "Bob;2,75" in lines[2] or "Bob;2.75" in lines[2]
Review Comment:
**Suggestion:** The regression test for custom CSV separator and decimal
allows both comma and dot decimals, so it will still pass even if the
CSV_EXPORT decimal setting is ignored, failing to detect regressions in numeric
formatting. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Unit test for CSV decimal formatting may silently pass.
- ⚠️ Table CSV exports with custom decimal can regress unnoticed.
```
</details>
```suggestion
assert lines[0] == "name;value"
# With decimal=",", decimal values should use comma as separator
assert "Alice;1,5" in lines[1]
assert "Bob;2,75" in lines[2]
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Open `test_apply_client_processing_csv_format_custom_separator()` in
`tests/unit_tests/charts/test_client_processing.py:2793-2827` (added in this
PR) which
uses `@with_config({"CSV_EXPORT": {"sep": ";", "decimal": ","}})` and calls
`apply_client_processing()`.
2. Note that `apply_client_processing()` in
`superset/charts/client_processing.py:312-398`
formats CSV exports via `processed_df.to_csv(...)` using `csv_export_config =
current_app.config.get("CSV_EXPORT", {})` at lines 391-395.
3. Introduce a regression by editing `processed_df.to_csv(buf,
index=show_default_index,
**csv_export_config)` at `superset/charts/client_processing.py:393-394` so
that it no
longer passes the `decimal` option (e.g., only passes `sep`), causing
numeric values in
the exported table CSV to retain a `.` decimal separator.
4. Run `pytest
tests/unit_tests/charts/test_client_processing.py::test_apply_client_processing_csv_format_custom_separator`;
observe the test still passes because the assertions at
`tests/unit_tests/charts/test_client_processing.py:2832-2836` accept both
`"Alice;1,5"`
and `"Alice;1.5"` (and likewise for Bob), so the regression in honoring
`CSV_EXPORT["decimal"]` is not detected.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/unit_tests/charts/test_client_processing.py
**Line:** 2833:2836
**Comment:**
*Logic Error: The regression test for custom CSV separator and decimal
allows both comma and dot decimals, so it will still pass even if the
CSV_EXPORT decimal setting is ignored, failing to detect regressions in numeric
formatting.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38170&comment_hash=5c5b778b282a68d4599918e0d587c90835840011f871e3c76d19ea6897a53b3e&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38170&comment_hash=5c5b778b282a68d4599918e0d587c90835840011f871e3c76d19ea6897a53b3e&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]