Re: [PR] fix(mixed-timeseries): preserve all-NaN metric columns after pivot when Jinja evaluates to NULL [superset]

via GitHub Sun, 10 May 2026 17:31:02 -0700


codeant-ai-for-open-source[bot] commented on code in PR #40004:
URL: https://github.com/apache/superset/pull/40004#discussion_r3215791235



##########
superset/utils/pandas_postprocessing/pivot.py:
##########
@@ -27,6 +28,35 @@
 )
 
 
+def _restore_dropped_metric_columns(
+    df: DataFrame, expected_metrics: list[str]
+) -> DataFrame:
+    """Re-add metric columns that pivot_table dropped due to all-NaN values.
+
+    When drop_missing_columns=True, pandas pivot_table silently removes columns
+    whose entries are all NaN. This breaks downstream post-processing steps
+    (rename, rolling) that use validate_column_args to assert the columns 
exist.
+    Restoring the columns as all-NaN preserves the expected schema.
+    """
+    if isinstance(df.columns, pd.MultiIndex):
+        existing_metrics = set(df.columns.get_level_values(0))
+        missing = [m for m in expected_metrics if m not in existing_metrics]
+        if missing:
+            category_values = (
+                df.columns.get_level_values(-1).unique()
+                if len(df.columns) > 0
+                else [None]
+            )
+            for metric in missing:
+                for cat in category_values:
+                    df[(metric, cat)] = float("nan")

Review Comment:
   **Suggestion:** The restoration logic for MultiIndex pivots only uses the 
last column level (`get_level_values(-1)`) and then writes 2-tuples (`(metric, 
cat)`), which is incorrect when the pivot has more than one `columns` 
dimension. For pivots like `columns=["a","b"]`, this creates 
malformed/incomplete column keys (or can raise when tuple depth does not 
match), so all-NaN metrics are not restored correctly and downstream operations 
can still fail. Build restored keys using the full non-metric tuple for every 
existing column combination (all levels after metric), not just the last level. 
[logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Multi-column pivot post-processing can crash during pivot().
   - ❌ All-NaN metrics not restored for multi-level pivots.
   - ⚠️ Mixed-timeseries charts using multi-column pivots may fail.
   - ⚠️ Rolling/rename post-processing fails for affected metrics.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure a query with post-processing in 
`superset/common/query_object.py:500-534` so
   that `exec_post_processing()` includes a pivot operation with `operation: 
"pivot"` and
   `options` containing `columns=["a", "b"]`, `drop_missing_columns=True` 
(default), and
   multiple metrics (e.g. `"metric": {"operator": "mean"}` and `"metric2": 
{"operator":
   "mean"}`).
   
   2. Ensure the database result feeding this query contains multiple grouping 
columns `a`
   and `b` and that, for at least one metric (e.g. `"metric"`), all values are 
NULL/NaN for
   every row; this matches the all-NaN metric scenario already covered for 
single-column
   pivots in
   
`tests/unit_tests/pandas_postprocessing/test_pivot.py:test_pivot_preserves_all_nan_metric_flat`
   (lines 70-92), but extended here to `columns=["a", "b"]` as in
   `test_pivot_eliminate_cartesian_product_columns` (lines 16-66).
   
   3. When `exec_post_processing()` calls `pandas_postprocessing.pivot()` 
(implemented in
   `superset/utils/pandas_postprocessing/pivot.py:60-142`) with `columns=["a", 
"b"]`,
   `df.pivot_table(...)` (lines 122-131) produces a MultiIndex on `df.columns` 
where level 0
   is the metric name and subsequent levels correspond to each `columns` entry 
(`a`, then
   `b`). Because one metric is entirely NaN and 
`dropna=drop_missing_columns=True`, pandas
   drops that metric's columns completely.
   
   4. `_restore_dropped_metric_columns()` (lines 31-57 in `pivot.py`) then runs 
for this
   MultiIndex DataFrame. In the MultiIndex branch (lines 41-52), it collects 
only the last
   column level via `df.columns.get_level_values(-1).unique()` (line 45) and, 
for each
   missing metric, assigns new columns using 2‑tuples `df[(metric, cat)] = 
float("nan")`
   (line 52). For a MultiIndex with more than two levels (metric + `a` + `b`), 
this key has
   the wrong depth: on current pandas versions this either (a) raises a pandas 
error when
   attempting to insert a column with a tuple shorter than 
`df.columns.nlevels`, or (b)
   creates malformed/incomplete column labels that don't match the original 
`(metric, a, b)`
   shape. In either case, the all-NaN metric is not properly restored for every 
`(a, b)`
   combination, so subsequent post-processing such as `rename` and `rolling` 
(both decorated
   with `validate_column_args` in 
`superset/utils/pandas_postprocessing/utils.py:114-131`)
   can still fail with `InvalidPostProcessingError("Referenced columns not 
available in
   DataFrame.")` when they reference that metric.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Futils%2Fpandas_postprocessing%2Fpivot.py%0A%2A%2ALine%3A%2A%2A%2045%3A52%0A%2A%2AComment%3A%2A%2A%0A%09%2ALogic%20Error%3A%20The%20restoration%20logic%20for%20MultiIndex%20pivots%20only%20uses%20the%20last%20column%20level%20%28%60get_level_values%28-1%29%60%29%20and%20then%20writes%202-tuples%20%28%60%28metric%2C%20cat%29%60%29%2C%20which%20is%20incorrect%20when%20the%20pivot%20has%20more%20than%20one%20%60columns%60%20dimension.%20For%20pivots%20like%20%60columns%3D%5B%22a%22%2C%22b%22%5D%60%2C%20this%20creates%20malformed%2Fincomplete%20column%20keys%20%28or%20can%20raise%20when%20tuple%20depth%20does%20not%20match%29%2C%20so%20all-NaN%20metrics%20are%20not%20restored%20correctly%20and%20downstream%20operations%20can%20still%20fail.%20Build%20restored%20keys%20using%20the%20full%20non-metric%20tuple%20for%20every
 
%20existing%20column%20combination%20%28all%20levels%20after%20metric%29%2C%20not%20just%20the%20last%20level.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Futils%2Fpandas_postprocessing%2Fpivot.py%0A%2A%2ALine%3A%2A%2A%2045%3A52%0A%2A%2AComment%3A%2A%2A%0A%09%2ALogic%20Error%3A%20The%20restoration%20logic%20for%20Multi
 
Index%20pivots%20only%20uses%20the%20last%20column%20level%20%28%60get_level_values%28-1%29%60%29%20and%20then%20writes%202-tuples%20%28%60%28metric%2C%20cat%29%60%29%2C%20which%20is%20incorrect%20when%20the%20pivot%20has%20more%20than%20one%20%60columns%60%20dimension.%20For%20pivots%20like%20%60columns%3D%5B%22a%22%2C%22b%22%5D%60%2C%20this%20creates%20malformed%2Fincomplete%20column%20keys%20%28or%20can%20raise%20when%20tuple%20depth%20does%20not%20match%29%2C%20so%20all-NaN%20metrics%20are%20not%20restored%20correctly%20and%20downstream%20operations%20can%20still%20fail.%20Build%20restored%20keys%20using%20the%20full%20non-metric%20tuple%20for%20every%20existing%20column%20combination%20%28all%20levels%20after%20metric%29%2C%20not%20just%20the%20last%20level.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20i
 
s%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/utils/pandas_postprocessing/pivot.py
   **Line:** 45:52
   **Comment:**
        *Logic Error: The restoration logic for MultiIndex pivots only uses the 
last column level (`get_level_values(-1)`) and then writes 2-tuples (`(metric, 
cat)`), which is incorrect when the pivot has more than one `columns` 
dimension. For pivots like `columns=["a","b"]`, this creates 
malformed/incomplete column keys (or can raise when tuple depth does not 
match), so all-NaN metrics are not restored correctly and downstream operations 
can still fail. Build restored keys using the full non-metric tuple for every 
existing column combination (all levels after metric), not just the last level.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40004&comment_hash=36f12915660f0454af59c262ef1b8c1abf812e601c4933ace680a80bc546d426&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40004&comment_hash=36f12915660f0454af59c262ef1b8c1abf812e601c4933ace680a80bc546d426&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(mixed-timeseries): preserve all-NaN metric columns after pivot when Jinja evaluates to NULL [superset]

Reply via email to