codeant-ai-for-open-source[bot] commented on code in PR #40195:
URL: https://github.com/apache/superset/pull/40195#discussion_r3254393712


##########
superset/utils/csv.py:
##########
@@ -28,15 +27,27 @@
 
 logger = logging.getLogger(__name__)
 
-negative_number_re = re.compile(r"^-[0-9.]+$")
+PROBLEMATIC_CSV_PREFIXES = "-@+|=%"
 
-# This regex will match if the string starts with:
-#
-#     1. one of -, @, +, |, =, %
-#     2. two double quotes immediately followed by one of -, @, +, |, =, %
-#     3. one or more spaces immediately followed by one of -, @, +, |, =, %
-#
-problematic_chars_re = re.compile(r'^(?:"{2}|\s{1,})(?=[\-@+|=%])|^[\-@+|=%]')
+
+def _starts_like_spreadsheet_formula(value: str) -> bool:
+    first = value[0]
+    if first in PROBLEMATIC_CSV_PREFIXES:
+        return True
+    if first == '"' and len(value) > 2:
+        return value[1] == '"' and value[2] in PROBLEMATIC_CSV_PREFIXES
+    if first.isspace():
+        stripped = value.lstrip()
+        return bool(stripped) and stripped[0] in PROBLEMATIC_CSV_PREFIXES

Review Comment:
   **Suggestion:** The leading-whitespace branch only checks `stripped[0]` 
against prefix characters, so values like ` " "=10+2` (whitespace followed by 
doubled-quote formula prefix, e.g. `  ""=...`) are not escaped. This creates a 
CSV-injection bypass for the quoted-formula case whenever attackers prepend 
whitespace. Reuse the same quoted-prefix logic after trimming whitespace 
(instead of only checking the first trimmed character) so both direct and 
whitespace-prefixed quoted formulas are escaped. [security]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ CSV chart exports allow spreadsheet formula injection bypass.
   - ❌ Malicious cell formulas execute when analysts open exported CSVs.
   - ⚠️ Scheduled email reports with CSV attachments inherit this risk.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Create or identify a chart whose underlying dataset has a string column 
containing a
   value like ` ""=10+2` (two leading spaces, then `""=10+2`), so this value 
appears in the
   chart's result DataFrame.
   
   2. Trigger a CSV export for that chart via the public API `GET
   /api/v1/chart/<pk>/data/?format=csv&type=full` or the UI "Download as CSV", 
which is
   handled by `ChartDataRestApi.data` in `superset/charts/data/api.py:65-105`. 
This builds a
   `QueryContext` and later calls `_get_data_response`, which eventually calls
   `QueryContext.get_data()` as seen in `superset/common/query_actions.py:4-23` 
and
   `superset/common/query_context.py:7-12`.
   
   3. Inside `QueryContextProcessor.get_data`
   (`superset/common/query_context_processor.py:245-260`), when `result_format 
==
   ChartDataResultFormat.CSV`, the code calls `csv.df_to_escaped_csv(df, 
index=include_index,
   **current_app.config["CSV_EXPORT"])`, which in turn applies `escape_value` 
to all string
   headers and object-typed cell values in `superset/utils/csv.py:78-90`.
   
   4. For the malicious cell value ` ""=10+2`, `escape_value` 
(`superset/utils/csv.py:53-75`)
   calls `_starts_like_spreadsheet_formula` (`superset/utils/csv.py:33-42`). In 
that helper,
   the first character is whitespace so the `if first.isspace()` branch at 
lines 39-41 runs,
   `stripped = value.lstrip()` becomes `'""=10+2'`, and the function returns 
`bool(stripped)
   and stripped[0] in PROBLEMATIC_CSV_PREFIXES`. Because `stripped[0]` is `"`, 
not one of
   `-@+|=%` from `PROBLEMATIC_CSV_PREFIXES` (line 30), 
`_starts_like_spreadsheet_formula`
   incorrectly returns `False`. As a result the guard `if
   _starts_like_spreadsheet_formula(value) and not _is_negative_number(value):` 
at lines
   62-63 is not taken, the apostrophe quoting is never added, and the CSV cell 
is written
   containing ` ""=10+2`. When a user opens this exported CSV in a spreadsheet 
(Excel, Google
   Sheets), this whitespace-prefixed, double-quoted formula is still 
interpreted as a
   formula, creating a CSV injection vector. This matches the suggestion's 
claim that
   whitespace-prefixed doubled-quote formulas bypass the current escaping 
because the
   whitespace branch only checks `stripped[0]` against the prefix set and never 
reuses the
   quoted-formula logic from lines 37-38 on the stripped value.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Futils%2Fcsv.py%0A%2A%2ALine%3A%2A%2A%2039%3A41%0A%2A%2AComment%3A%2A%2A%0A%09%2ASecurity%3A%20The%20leading-whitespace%20branch%20only%20checks%20%60stripped%5B0%5D%60%20against%20prefix%20characters%2C%20so%20values%20like%20%60%20%22%20%22%3D10%2B2%60%20%28whitespace%20followed%20by%20doubled-quote%20formula%20prefix%2C%20e.g.%20%60%20%20%22%22%3D...%60%29%20are%20not%20escaped.%20This%20creates%20a%20CSV-injection%20bypass%20for%20the%20quoted-formula%20case%20whenever%20attackers%20prepend%20whitespace.%20Reuse%20the%20same%20quoted-prefix%20logic%20after%20trimming%20whitespace%20%28instead%20of%20only%20checking%20the%20first%20trimmed%20character%29%20so%20both%20direct%20and%20whitespace-prefixed%20quoted%20formulas%20are%20escaped.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20corre
 
ct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Futils%2Fcsv.py%0A%2A%2ALine%3A%2A%2A%2039%3A41%0A%2A%2AComment%3A%2A%2A%0A%09%2ASecurity%3A%20The%20leading-whitespace%20branch%20only%20checks%20%60stripped%5B0%5D%60%20against%20prefix%20characters%2C%20so%20values%20like%20%60%20%22%20%22%3D10%2B2%60%20%28whitespace%20followed%20by%20doubled-quote%20formula%20prefix%2C%20e.g.%20%60%20%20%22%22%3D
 
...%60%29%20are%20not%20escaped.%20This%20creates%20a%20CSV-injection%20bypass%20for%20the%20quoted-formula%20case%20whenever%20attackers%20prepend%20whitespace.%20Reuse%20the%20same%20quoted-prefix%20logic%20after%20trimming%20whitespace%20%28instead%20of%20only%20checking%20the%20first%20trimmed%20character%29%20so%20both%20direct%20and%20whitespace-prefixed%20quoted%20formulas%20are%20escaped.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/utils/csv.py
   **Line:** 39:41
   **Comment:**
        *Security: The leading-whitespace branch only checks `stripped[0]` 
against prefix characters, so values like ` " "=10+2` (whitespace followed by 
doubled-quote formula prefix, e.g. `  ""=...`) are not escaped. This creates a 
CSV-injection bypass for the quoted-formula case whenever attackers prepend 
whitespace. Reuse the same quoted-prefix logic after trimming whitespace 
(instead of only checking the first trimmed character) so both direct and 
whitespace-prefixed quoted formulas are escaped.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40195&comment_hash=47726552491dc2ef8ae4c23ddb593a0bf2f1404e3558a685b6d6293ed48fe155&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40195&comment_hash=47726552491dc2ef8ae4c23ddb593a0bf2f1404e3558a685b6d6293ed48fe155&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to