codeant-ai-for-open-source[bot] commented on code in PR #40005:
URL: https://github.com/apache/superset/pull/40005#discussion_r3215800509
##########
superset/utils/pandas_postprocessing/pivot.py:
##########
@@ -27,6 +28,35 @@
)
+def _restore_dropped_metric_columns(
+ df: DataFrame, expected_metrics: list[str]
+) -> DataFrame:
+ """Re-add metric columns that pivot_table dropped due to all-NaN values.
+
+ When drop_missing_columns=True, pandas pivot_table silently removes columns
+ whose entries are all NaN. This breaks downstream post-processing steps
+ (rename, rolling) that use validate_column_args to assert the columns
exist.
+ Restoring the columns as all-NaN preserves the expected schema.
+ """
+ if isinstance(df.columns, pd.MultiIndex):
+ existing_metrics = set(df.columns.get_level_values(0))
+ missing = [m for m in expected_metrics if m not in existing_metrics]
+ if missing:
+ category_values = (
+ df.columns.get_level_values(-1).unique()
+ if len(df.columns) > 0
+ else [None]
+ )
+ for metric in missing:
+ for cat in category_values:
+ df[(metric, cat)] = float("nan")
Review Comment:
**Suggestion:** MultiIndex restoration is incorrect for pivots with more
than one `columns` dimension: it only captures the last level
(`get_level_values(-1)`) and rebuilds missing metric columns as `(metric,
cat)`. For 3+ level column indexes this drops higher-level grouping keys, so
restored columns do not match the original pivot schema and downstream
flatten/rename/rolling references can still fail. Rebuild missing columns using
full non-metric tuples from existing columns (all levels after level 0), not
just the last level. [logic error]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ Pivot post-processing breaks for multi-column series configurations.
- ⚠️ Mixed-timeseries and pivot table charts can still error.
- ⚠️ Flattened exports may miss expected metric/category combinations.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Configure a visualization that uses a pivot post-processing with multiple
series
columns so that `pivot_operator` in
`superset/migrations/shared/migrate_viz/query_functions.py:49-75` builds a
post-processing
entry `{"operation": "pivot", "options": {"index": [x_axis_label],
"columns": [col1, col2,
...], "drop_missing_columns": True, "aggregates": {...}}}` where `columns`
has length >= 2
and `drop_missing_columns` is true (derived from `show_empty_columns` at
line 73).
2. Ensure one of the metrics is Jinja-based and evaluates to NULL for every
row (as
described in the PR summary), so the query result DataFrame passed into
`QueryObject.post_process_data()` in
`superset/common/query_object.py:510-524` has an
all-NaN metric column (e.g. `"metric_jinja"`) plus at least one non-NaN
metric and the
series columns used in step 1.
3. When `post_process_data()` processes the `"pivot"` operation, it calls
`superset.utils.pandas_postprocessing.pivot.pivot()` via
`getattr(pandas_postprocessing,
operation)(df, **options)` at `superset/common/query_object.py:22-23`, and
inside
`pivot()` (`superset/utils/pandas_postprocessing/pivot.py:60-72`) pandas
`pivot_table`
builds a MultiIndex on `df.columns` whose level 0 is metric names and
subsequent levels
correspond to each entry in `columns` (e.g. `(metric, col1_value,
col2_value)`); because
the Jinja metric is all NaN, all of its `(metric_jinja, *, *)` columns are
dropped by
`pivot_table(..., dropna=True)`.
4. Since `drop_missing_columns` is true, `pivot()` then calls
`_restore_dropped_metric_columns()` at
`superset/utils/pandas_postprocessing/pivot.py:31-57`, where for a
MultiIndex it computes
`category_values = df.columns.get_level_values(-1).unique()` (only the last
level) at
lines 45-48 and re-adds missing metrics using `df[(metric, cat)] =
float("nan")` at lines
50-52; on a 3+ level MultiIndex this recreates columns keyed as `(metric,
last_level_value)` instead of `(metric, col1_value, col2_value, ...)`,
yielding an
inconsistent MultiIndex that either raises during assignment (tuple length
mismatch) or
produces malformed column keys, so downstream consumers like `flatten()`
(`superset/utils/pandas_postprocessing/flatten.py:90-101`) and any logic
expecting full
`(metric, col1, col2, ...)` tuples operate on incorrect schema and
metric/category
combinations remain missing for the all-NaN metric.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Futils%2Fpandas_postprocessing%2Fpivot.py%0A%2A%2ALine%3A%2A%2A%2045%3A52%0A%2A%2AComment%3A%2A%2A%0A%09%2ALogic%20Error%3A%20MultiIndex%20restoration%20is%20incorrect%20for%20pivots%20with%20more%20than%20one%20%60columns%60%20dimension%3A%20it%20only%20captures%20the%20last%20level%20%28%60get_level_values%28-1%29%60%29%20and%20rebuilds%20missing%20metric%20columns%20as%20%60%28metric%2C%20cat%29%60.%20For%203%2B%20level%20column%20indexes%20this%20drops%20higher-level%20grouping%20keys%2C%20so%20restored%20columns%20do%20not%20match%20the%20original%20pivot%20schema%20and%20downstream%20flatten%2Frename%2Frolling%20references%20can%20still%20fail.%20Rebuild%20missing%20columns%20using%20full%20non-metric%20tuples%20from%20existing%20columns%20%28all%20levels%20after%20level%200%29%2C%20not%20just%20the%20last%20le
vel.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Futils%2Fpandas_postprocessing%2Fpivot.py%0A%2A%2ALine%3A%2A%2A%2045%3A52%0A%2A%2AComment%3A%2A%2A%0A%09%2ALogic%20Error%3A%20MultiIndex%20restoration%20is%20incorrect%20for%20pivots%20with%20more%20than%20one%20%60columns%60%20dimension%3A%20it%20only%20captures%20the%2
0last%20level%20%28%60get_level_values%28-1%29%60%29%20and%20rebuilds%20missing%20metric%20columns%20as%20%60%28metric%2C%20cat%29%60.%20For%203%2B%20level%20column%20indexes%20this%20drops%20higher-level%20grouping%20keys%2C%20so%20restored%20columns%20do%20not%20match%20the%20original%20pivot%20schema%20and%20downstream%20flatten%2Frename%2Frolling%20references%20can%20still%20fail.%20Rebuild%20missing%20columns%20using%20full%20non-metric%20tuples%20from%20existing%20columns%20%28all%20levels%20after%20level%200%29%2C%20not%20just%20the%20last%20level.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%2
0fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/utils/pandas_postprocessing/pivot.py
**Line:** 45:52
**Comment:**
*Logic Error: MultiIndex restoration is incorrect for pivots with more
than one `columns` dimension: it only captures the last level
(`get_level_values(-1)`) and rebuilds missing metric columns as `(metric,
cat)`. For 3+ level column indexes this drops higher-level grouping keys, so
restored columns do not match the original pivot schema and downstream
flatten/rename/rolling references can still fail. Rebuild missing columns using
full non-metric tuples from existing columns (all levels after level 0), not
just the last level.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40005&comment_hash=0855b54228922302d5feaf7f3130c65eebf867506c9b82ca50d842954dd65711&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40005&comment_hash=0855b54228922302d5feaf7f3130c65eebf867506c9b82ca50d842954dd65711&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]