Re: [PR] fix(exports,email,logs): csv formula escaping, subject CRLF stripping, UTC log pruning [superset]

via GitHub Mon, 01 Jun 2026 17:32:01 -0700


Copilot commented on code in PR #40645:
URL: https://github.com/apache/superset/pull/40645#discussion_r3337906969



##########
superset/commands/logs/prune.py:
##########
@@ -64,8 +64,11 @@ def run(self) -> None:
         start_time = time.time()
 
         # Select all IDs that need to be deleted
+        # Log.dttm is stored in UTC, so compute the cutoff in UTC as well to
+        # keep retention correct regardless of the server's local timezone.
         select_stmt = sa.select(Log.id).where(
-            Log.dttm < datetime.now() - 
timedelta(days=self.retention_period_days)
+            Log.dttm
+            < datetime.now(tz=timezone.utc) - 
timedelta(days=self.retention_period_days)

Review Comment:
   `Log.dttm` is declared as a naive `DateTime` column 
(`superset/models/core.py:1454`) populated with `datetime.utcnow`, i.e. 
timezone‑naive UTC. Comparing it to a timezone‑aware 
`datetime.now(tz=timezone.utc)` value mixes naive and aware datetimes at the 
SQL layer: on PostgreSQL this raises `can't compare offset-naive and 
offset-aware datetimes` / `operator does not exist: timestamp without time zone 
< timestamp with time zone` (or, if implicitly cast, uses the session's local 
timezone), and on SQLite it serializes with a `+00:00` offset, breaking the 
lexicographic comparison against naive ISO strings already in the column. The 
original "fix local timezone" intent is correct, but the safer change is to 
keep the comparison value naive UTC, e.g. `datetime.utcnow()` (matching the 
column's `default`) or `datetime.now(tz=timezone.utc).replace(tzinfo=None)`.



##########
tests/unit_tests/utils/csv_tests.py:
##########
@@ -63,6 +63,13 @@ def test_escape_value():
     result = csv.escape_value(" =10+2")
     assert result == "' =10+2"
 
+    # A leading tab or carriage return is also treated as a dangerous prefix.
+    result = csv.escape_value("\t=10+2")
+    assert result == "'\t=10+2"
+
+    result = csv.escape_value("\r=10+2")
+    assert result == "'\r=10+2"

Review Comment:
   These two assertions don't actually verify the new behavior added by this 
regex change. The pre-existing alternative `^(?:"{2}|\s{1,})(?=[\-@+|=%])` 
already matches leading whitespace (including `\t` and `\r`, since `\s` covers 
both) followed by a dangerous char, so `"\t=10+2"` and `"\r=10+2"` were already 
escaped before this PR. To exercise the newly-added `\t`/`\r` alternative, test 
inputs whose dangerous prefix is the tab/CR itself, e.g. `"\tsafe"` / 
`"\rsafe"` (or `"\t"` alone), which the old regex would not have escaped.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(exports,email,logs): csv formula escaping, subject CRLF stripping, UTC log pruning [superset]

Reply via email to