Copilot commented on code in PR #40645:
URL: https://github.com/apache/superset/pull/40645#discussion_r3337906969
##########
superset/commands/logs/prune.py:
##########
@@ -64,8 +64,11 @@ def run(self) -> None:
start_time = time.time()
# Select all IDs that need to be deleted
+ # Log.dttm is stored in UTC, so compute the cutoff in UTC as well to
+ # keep retention correct regardless of the server's local timezone.
select_stmt = sa.select(Log.id).where(
- Log.dttm < datetime.now() -
timedelta(days=self.retention_period_days)
+ Log.dttm
+ < datetime.now(tz=timezone.utc) -
timedelta(days=self.retention_period_days)
Review Comment:
`Log.dttm` is declared as a naive `DateTime` column
(`superset/models/core.py:1454`) populated with `datetime.utcnow`, i.e.
timezone‑naive UTC. Comparing it to a timezone‑aware
`datetime.now(tz=timezone.utc)` value mixes naive and aware datetimes at the
SQL layer: on PostgreSQL this raises `can't compare offset-naive and
offset-aware datetimes` / `operator does not exist: timestamp without time zone
< timestamp with time zone` (or, if implicitly cast, uses the session's local
timezone), and on SQLite it serializes with a `+00:00` offset, breaking the
lexicographic comparison against naive ISO strings already in the column. The
original "fix local timezone" intent is correct, but the safer change is to
keep the comparison value naive UTC, e.g. `datetime.utcnow()` (matching the
column's `default`) or `datetime.now(tz=timezone.utc).replace(tzinfo=None)`.
##########
tests/unit_tests/utils/csv_tests.py:
##########
@@ -63,6 +63,13 @@ def test_escape_value():
result = csv.escape_value(" =10+2")
assert result == "' =10+2"
+ # A leading tab or carriage return is also treated as a dangerous prefix.
+ result = csv.escape_value("\t=10+2")
+ assert result == "'\t=10+2"
+
+ result = csv.escape_value("\r=10+2")
+ assert result == "'\r=10+2"
Review Comment:
These two assertions don't actually verify the new behavior added by this
regex change. The pre-existing alternative `^(?:"{2}|\s{1,})(?=[\-@+|=%])`
already matches leading whitespace (including `\t` and `\r`, since `\s` covers
both) followed by a dangerous char, so `"\t=10+2"` and `"\r=10+2"` were already
escaped before this PR. To exercise the newly-added `\t`/`\r` alternative, test
inputs whose dangerous prefix is the tab/CR itself, e.g. `"\tsafe"` /
`"\rsafe"` (or `"\t"` alone), which the old regex would not have escaped.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]