codeant-ai-for-open-source[bot] commented on code in PR #40195:
URL: https://github.com/apache/superset/pull/40195#discussion_r3254393712
##########
superset/utils/csv.py:
##########
@@ -28,15 +27,27 @@
logger = logging.getLogger(__name__)
-negative_number_re = re.compile(r"^-[0-9.]+$")
+PROBLEMATIC_CSV_PREFIXES = "-@+|=%"
-# This regex will match if the string starts with:
-#
-# 1. one of -, @, +, |, =, %
-# 2. two double quotes immediately followed by one of -, @, +, |, =, %
-# 3. one or more spaces immediately followed by one of -, @, +, |, =, %
-#
-problematic_chars_re = re.compile(r'^(?:"{2}|\s{1,})(?=[\-@+|=%])|^[\-@+|=%]')
+
+def _starts_like_spreadsheet_formula(value: str) -> bool:
+ first = value[0]
+ if first in PROBLEMATIC_CSV_PREFIXES:
+ return True
+ if first == '"' and len(value) > 2:
+ return value[1] == '"' and value[2] in PROBLEMATIC_CSV_PREFIXES
+ if first.isspace():
+ stripped = value.lstrip()
+ return bool(stripped) and stripped[0] in PROBLEMATIC_CSV_PREFIXES
Review Comment:
**Suggestion:** The leading-whitespace branch only checks `stripped[0]`
against prefix characters, so values like ` " "=10+2` (whitespace followed by
doubled-quote formula prefix, e.g. ` ""=...`) are not escaped. This creates a
CSV-injection bypass for the quoted-formula case whenever attackers prepend
whitespace. Reuse the same quoted-prefix logic after trimming whitespace
(instead of only checking the first trimmed character) so both direct and
whitespace-prefixed quoted formulas are escaped. [security]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ CSV chart exports allow spreadsheet formula injection bypass.
- ❌ Malicious cell formulas execute when analysts open exported CSVs.
- ⚠️ Scheduled email reports with CSV attachments inherit this risk.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Create or identify a chart whose underlying dataset has a string column
containing a
value like ` ""=10+2` (two leading spaces, then `""=10+2`), so this value
appears in the
chart's result DataFrame.
2. Trigger a CSV export for that chart via the public API `GET
/api/v1/chart/<pk>/data/?format=csv&type=full` or the UI "Download as CSV",
which is
handled by `ChartDataRestApi.data` in `superset/charts/data/api.py:65-105`.
This builds a
`QueryContext` and later calls `_get_data_response`, which eventually calls
`QueryContext.get_data()` as seen in `superset/common/query_actions.py:4-23`
and
`superset/common/query_context.py:7-12`.
3. Inside `QueryContextProcessor.get_data`
(`superset/common/query_context_processor.py:245-260`), when `result_format
==
ChartDataResultFormat.CSV`, the code calls `csv.df_to_escaped_csv(df,
index=include_index,
**current_app.config["CSV_EXPORT"])`, which in turn applies `escape_value`
to all string
headers and object-typed cell values in `superset/utils/csv.py:78-90`.
4. For the malicious cell value ` ""=10+2`, `escape_value`
(`superset/utils/csv.py:53-75`)
calls `_starts_like_spreadsheet_formula` (`superset/utils/csv.py:33-42`). In
that helper,
the first character is whitespace so the `if first.isspace()` branch at
lines 39-41 runs,
`stripped = value.lstrip()` becomes `'""=10+2'`, and the function returns
`bool(stripped)
and stripped[0] in PROBLEMATIC_CSV_PREFIXES`. Because `stripped[0]` is `"`,
not one of
`-@+|=%` from `PROBLEMATIC_CSV_PREFIXES` (line 30),
`_starts_like_spreadsheet_formula`
incorrectly returns `False`. As a result the guard `if
_starts_like_spreadsheet_formula(value) and not _is_negative_number(value):`
at lines
62-63 is not taken, the apostrophe quoting is never added, and the CSV cell
is written
containing ` ""=10+2`. When a user opens this exported CSV in a spreadsheet
(Excel, Google
Sheets), this whitespace-prefixed, double-quoted formula is still
interpreted as a
formula, creating a CSV injection vector. This matches the suggestion's
claim that
whitespace-prefixed doubled-quote formulas bypass the current escaping
because the
whitespace branch only checks `stripped[0]` against the prefix set and never
reuses the
quoted-formula logic from lines 37-38 on the stripped value.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Futils%2Fcsv.py%0A%2A%2ALine%3A%2A%2A%2039%3A41%0A%2A%2AComment%3A%2A%2A%0A%09%2ASecurity%3A%20The%20leading-whitespace%20branch%20only%20checks%20%60stripped%5B0%5D%60%20against%20prefix%20characters%2C%20so%20values%20like%20%60%20%22%20%22%3D10%2B2%60%20%28whitespace%20followed%20by%20doubled-quote%20formula%20prefix%2C%20e.g.%20%60%20%20%22%22%3D...%60%29%20are%20not%20escaped.%20This%20creates%20a%20CSV-injection%20bypass%20for%20the%20quoted-formula%20case%20whenever%20attackers%20prepend%20whitespace.%20Reuse%20the%20same%20quoted-prefix%20logic%20after%20trimming%20whitespace%20%28instead%20of%20only%20checking%20the%20first%20trimmed%20character%29%20so%20both%20direct%20and%20whitespace-prefixed%20quoted%20formulas%20are%20escaped.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20corre
ct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Futils%2Fcsv.py%0A%2A%2ALine%3A%2A%2A%2039%3A41%0A%2A%2AComment%3A%2A%2A%0A%09%2ASecurity%3A%20The%20leading-whitespace%20branch%20only%20checks%20%60stripped%5B0%5D%60%20against%20prefix%20characters%2C%20so%20values%20like%20%60%20%22%20%22%3D10%2B2%60%20%28whitespace%20followed%20by%20doubled-quote%20formula%20prefix%2C%20e.g.%20%60%20%20%22%22%3D
...%60%29%20are%20not%20escaped.%20This%20creates%20a%20CSV-injection%20bypass%20for%20the%20quoted-formula%20case%20whenever%20attackers%20prepend%20whitespace.%20Reuse%20the%20same%20quoted-prefix%20logic%20after%20trimming%20whitespace%20%28instead%20of%20only%20checking%20the%20first%20trimmed%20character%29%20so%20both%20direct%20and%20whitespace-prefixed%20quoted%20formulas%20are%20escaped.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/utils/csv.py
**Line:** 39:41
**Comment:**
*Security: The leading-whitespace branch only checks `stripped[0]`
against prefix characters, so values like ` " "=10+2` (whitespace followed by
doubled-quote formula prefix, e.g. ` ""=...`) are not escaped. This creates a
CSV-injection bypass for the quoted-formula case whenever attackers prepend
whitespace. Reuse the same quoted-prefix logic after trimming whitespace
(instead of only checking the first trimmed character) so both direct and
whitespace-prefixed quoted formulas are escaped.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40195&comment_hash=47726552491dc2ef8ae4c23ddb593a0bf2f1404e3558a685b6d6293ed48fe155&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40195&comment_hash=47726552491dc2ef8ae4c23ddb593a0bf2f1404e3558a685b6d6293ed48fe155&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]