fitzee commented on PR #40127:
URL: https://github.com/apache/superset/pull/40127#issuecomment-4455494822

   ## Griffen + Codex Synthesis — Needs changes
   
   ---
   
   ## Dual Review Synthesis
   
   **Griffen verdict:** LGTM with nits
   **Codex verdict:** Needs changes
   **Agreement level:** Partial — diverge on door classification and the import 
path finding
   
   ### Where they agreed
   - `result_set.py` fix is correct: `errors="coerce"` → NaT stops the 
`OutOfBoundsDatetime` from reaching `logger.exception()`, test verifies the 
right invariants
   - No test coverage for the import path (`utils.py`)
   - Silent NaT in `result_set.py` with no warning is an inconsistency worth 
noting
   
   ### Where they diverged
   
   **Door classification — Codex says ONE-WAY for the import path; Griffen 
initially said TWO-WAY overall. Codex is correct.**
   
   `result_set.py` is two-way: stateless query display, fully reversible by 
revert.
   
   `utils.py` (the import path) has a one-way characteristic: `errors="coerce"` 
writes `NULL`/`NaT` into the imported dataset. Rolling back the PR code does 
not recover values already coerced during a prior import. That's persistence, 
which makes it one-way for data already touched.
   
   ### Standout finding — caught by Codex, missed by Griffen
   
   **[MEDIUM] `errors="coerce"` on the import path is broader than the stated 
fix**
   
   `pd.to_datetime(df[column_name], errors="coerce")` coerces *all* parse 
failures to NaT — not just out-of-bounds timestamps. A CSV with `"not-a-date"` 
or a wrong-format string in a datetime column will now import silently with 
NaT, where previously it would raise and block the import. The warning message 
says `"out-of-range datetime value(s)"` but the actual behavior covers 
malformed values too — misleading for operators reading logs.
   
   *Confidence: Fact*
   
   This is a data-quality contract change that goes beyond the stated intent of 
the PR. The `result_set.py` fix is correctly scoped to the observability 
problem. The import path fix is not.
   
   **Mitigations (pick one):**
   
   1. Scope to only `OutOfBoundsDatetime` on the import path:
   ```python
   try:
       df[column_name] = pd.to_datetime(df[column_name])
   except OutOfBoundsDatetime:
       converted = pd.to_datetime(df[column_name], errors="coerce")
       n_coerced = int(converted.isna().sum() - df[column_name].isna().sum())
       logger.warning("Coerced %d out-of-bounds datetime value(s) in column 
'%s' to NaT", n_coerced, column_name)
       df[column_name] = converted
   ```
   This preserves the fail-loud behavior for malformed input while fixing the 
out-of-bounds case.
   
   2. Or fix the warning message to say `"invalid or out-of-range"` and add 
explicit tests documenting that malformed dates are now silently accepted — 
making the contract change intentional and observable.
   
   ---
   
   ### Griffen's final call
   
   **Needs changes** — scoped to the import path. The `result_set.py` fix is 
sound; merge that half. The `utils.py` change either needs to be scoped to 
`OutOfBoundsDatetime` only, or the broader coercion needs to be a documented 
and tested product decision rather than an incidental side effect.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to