frlm opened a new pull request, #30961: URL: https://github.com/apache/superset/pull/30961
**Title:** fix(csv_export): use custom CSV_EXPORT parameters in pd.read_csv ### Bug description Function: apply_post_process The issue is that `pd.read_csv` uses the default values of pandas instead of the parameters defined in `CSV_EXPORT` in `superset_config`. This problem is rarely noticeable when using the separator `,` and the decimal `.`. However, with the configuration `CSV_EXPORT='{"encoding": "utf-8", "sep": ";", "decimal": ","}'`, the issue becomes evident. This change ensures that `pd.read_csv` uses the parameters defined in `CSV_EXPORT`. **Steps to reproduce error**: - Configure `CSV_EXPORT` with the following parameters: ```python CSV_EXPORT = { "encoding": "utf-8", "sep": ";", "decimal": "," } - Open a default chart in Superset of the Pivot Table type. In this example, we are using Pivot Table v2 within the USA Births Names dashboard:  - Click on Download > **Export to Pivoted .CSV**  - Download is blocked by an error. **Cause**: The error is generated by an anomaly in the input DataFrame df, which has the following format (a single column with all distinct fields separated by a semicolon separator): ``` ,state;name;sum__num 0,other;Michael;1047996 1,other;Christopher;803607 2,other;James;749686 ``` **Fix**: Added a bug fix to read data with right CSV_EXPORT settings **Code Changes:** ~~~python elif query["result_format"] == ChartDataResultFormat.CSV: df = pd.read_csv(StringIO(data), delimiter=superset_config.CSV_EXPORT.get('sep'), encoding=superset_config.CSV_EXPORT.get('encoding'), decimal=superset_config.CSV_EXPORT.get('decimal')) ~~~ **Complete Code** ~~~python def apply_post_process( result: dict[Any, Any], form_data: Optional[dict[str, Any]] = None, datasource: Optional[Union["BaseDatasource", "Query"]] = None, ) -> dict[Any, Any]: form_data = form_data or {} viz_type = form_data.get("viz_type") if viz_type not in post_processors: return result post_processor = post_processors[viz_type] for query in result["queries"]: if query["result_format"] not in (rf.value for rf in ChartDataResultFormat): raise Exception( # pylint: disable=broad-exception-raised f"Result format {query['result_format']} not supported" ) data = query["data"] if isinstance(data, str): data = data.strip() if not data: # do not try to process empty data continue if query["result_format"] == ChartDataResultFormat.JSON: df = pd.DataFrame.from_dict(data) elif query["result_format"] == ChartDataResultFormat.CSV: df = pd.read_csv(StringIO(data), delimiter=superset_config.CSV_EXPORT.get('sep'), encoding=superset_config.CSV_EXPORT.get('encoding'), decimal=superset_config.CSV_EXPORT.get('decimal')) # convert all columns to verbose (label) name if datasource: df.rename(columns=datasource.data["verbose_map"], inplace=True) processed_df = post_processor(df, form_data, datasource) query["colnames"] = list(processed_df.columns) query["indexnames"] = list(processed_df.index) query["coltypes"] = extract_dataframe_dtypes(processed_df, datasource) query["rowcount"] = len(processed_df.index) # Flatten hierarchical columns/index since they are represented as # `Tuple[str]`. Otherwise encoding to JSON later will fail because # maps cannot have tuples as their keys in JSON. processed_df.columns = [ " ".join(str(name) for name in column).strip() if isinstance(column, tuple) else column for column in processed_df.columns ] processed_df.index = [ " ".join(str(name) for name in index).strip() if isinstance(index, tuple) else index for index in processed_df.index ] if query["result_format"] == ChartDataResultFormat.JSON: query["data"] = processed_df.to_dict() elif query["result_format"] == ChartDataResultFormat.CSV: buf = StringIO() processed_df.to_csv(buf) buf.seek(0) query["data"] = buf.getvalue() return result ~~~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org For additional commands, e-mail: notifications-h...@superset.apache.org