fitzee commented on PR #40127:
URL: https://github.com/apache/superset/pull/40127#issuecomment-4455494822
## Griffen + Codex Synthesis — Needs changes
---
## Dual Review Synthesis
**Griffen verdict:** LGTM with nits
**Codex verdict:** Needs changes
**Agreement level:** Partial — diverge on door classification and the import
path finding
### Where they agreed
- `result_set.py` fix is correct: `errors="coerce"` → NaT stops the
`OutOfBoundsDatetime` from reaching `logger.exception()`, test verifies the
right invariants
- No test coverage for the import path (`utils.py`)
- Silent NaT in `result_set.py` with no warning is an inconsistency worth
noting
### Where they diverged
**Door classification — Codex says ONE-WAY for the import path; Griffen
initially said TWO-WAY overall. Codex is correct.**
`result_set.py` is two-way: stateless query display, fully reversible by
revert.
`utils.py` (the import path) has a one-way characteristic: `errors="coerce"`
writes `NULL`/`NaT` into the imported dataset. Rolling back the PR code does
not recover values already coerced during a prior import. That's persistence,
which makes it one-way for data already touched.
### Standout finding — caught by Codex, missed by Griffen
**[MEDIUM] `errors="coerce"` on the import path is broader than the stated
fix**
`pd.to_datetime(df[column_name], errors="coerce")` coerces *all* parse
failures to NaT — not just out-of-bounds timestamps. A CSV with `"not-a-date"`
or a wrong-format string in a datetime column will now import silently with
NaT, where previously it would raise and block the import. The warning message
says `"out-of-range datetime value(s)"` but the actual behavior covers
malformed values too — misleading for operators reading logs.
*Confidence: Fact*
This is a data-quality contract change that goes beyond the stated intent of
the PR. The `result_set.py` fix is correctly scoped to the observability
problem. The import path fix is not.
**Mitigations (pick one):**
1. Scope to only `OutOfBoundsDatetime` on the import path:
```python
try:
df[column_name] = pd.to_datetime(df[column_name])
except OutOfBoundsDatetime:
converted = pd.to_datetime(df[column_name], errors="coerce")
n_coerced = int(converted.isna().sum() - df[column_name].isna().sum())
logger.warning("Coerced %d out-of-bounds datetime value(s) in column
'%s' to NaT", n_coerced, column_name)
df[column_name] = converted
```
This preserves the fail-loud behavior for malformed input while fixing the
out-of-bounds case.
2. Or fix the warning message to say `"invalid or out-of-range"` and add
explicit tests documenting that malformed dates are now silently accepted —
making the contract change intentional and observable.
---
### Griffen's final call
**Needs changes** — scoped to the import path. The `result_set.py` fix is
sound; merge that half. The `utils.py` change either needs to be scoped to
`OutOfBoundsDatetime` only, or the broader coercion needs to be a documented
and tested product decision rather than an incidental side effect.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]