codeant-ai-for-open-source[bot] commented on code in PR #40005:
URL: https://github.com/apache/superset/pull/40005#discussion_r3215800509


##########
superset/utils/pandas_postprocessing/pivot.py:
##########
@@ -27,6 +28,35 @@
 )
 
 
+def _restore_dropped_metric_columns(
+    df: DataFrame, expected_metrics: list[str]
+) -> DataFrame:
+    """Re-add metric columns that pivot_table dropped due to all-NaN values.
+
+    When drop_missing_columns=True, pandas pivot_table silently removes columns
+    whose entries are all NaN. This breaks downstream post-processing steps
+    (rename, rolling) that use validate_column_args to assert the columns 
exist.
+    Restoring the columns as all-NaN preserves the expected schema.
+    """
+    if isinstance(df.columns, pd.MultiIndex):
+        existing_metrics = set(df.columns.get_level_values(0))
+        missing = [m for m in expected_metrics if m not in existing_metrics]
+        if missing:
+            category_values = (
+                df.columns.get_level_values(-1).unique()
+                if len(df.columns) > 0
+                else [None]
+            )
+            for metric in missing:
+                for cat in category_values:
+                    df[(metric, cat)] = float("nan")

Review Comment:
   **Suggestion:** MultiIndex restoration is incorrect for pivots with more 
than one `columns` dimension: it only captures the last level 
(`get_level_values(-1)`) and rebuilds missing metric columns as `(metric, 
cat)`. For 3+ level column indexes this drops higher-level grouping keys, so 
restored columns do not match the original pivot schema and downstream 
flatten/rename/rolling references can still fail. Rebuild missing columns using 
full non-metric tuples from existing columns (all levels after level 0), not 
just the last level. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Pivot post-processing breaks for multi-column series configurations.
   - ⚠️ Mixed-timeseries and pivot table charts can still error.
   - ⚠️ Flattened exports may miss expected metric/category combinations.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure a visualization that uses a pivot post-processing with multiple 
series
   columns so that `pivot_operator` in
   `superset/migrations/shared/migrate_viz/query_functions.py:49-75` builds a 
post-processing
   entry `{"operation": "pivot", "options": {"index": [x_axis_label], 
"columns": [col1, col2,
   ...], "drop_missing_columns": True, "aggregates": {...}}}` where `columns` 
has length >= 2
   and `drop_missing_columns` is true (derived from `show_empty_columns` at 
line 73).
   
   2. Ensure one of the metrics is Jinja-based and evaluates to NULL for every 
row (as
   described in the PR summary), so the query result DataFrame passed into
   `QueryObject.post_process_data()` in 
`superset/common/query_object.py:510-524` has an
   all-NaN metric column (e.g. `"metric_jinja"`) plus at least one non-NaN 
metric and the
   series columns used in step 1.
   
   3. When `post_process_data()` processes the `"pivot"` operation, it calls
   `superset.utils.pandas_postprocessing.pivot.pivot()` via 
`getattr(pandas_postprocessing,
   operation)(df, **options)` at `superset/common/query_object.py:22-23`, and 
inside
   `pivot()` (`superset/utils/pandas_postprocessing/pivot.py:60-72`) pandas 
`pivot_table`
   builds a MultiIndex on `df.columns` whose level 0 is metric names and 
subsequent levels
   correspond to each entry in `columns` (e.g. `(metric, col1_value, 
col2_value)`); because
   the Jinja metric is all NaN, all of its `(metric_jinja, *, *)` columns are 
dropped by
   `pivot_table(..., dropna=True)`.
   
   4. Since `drop_missing_columns` is true, `pivot()` then calls
   `_restore_dropped_metric_columns()` at
   `superset/utils/pandas_postprocessing/pivot.py:31-57`, where for a 
MultiIndex it computes
   `category_values = df.columns.get_level_values(-1).unique()` (only the last 
level) at
   lines 45-48 and re-adds missing metrics using `df[(metric, cat)] = 
float("nan")` at lines
   50-52; on a 3+ level MultiIndex this recreates columns keyed as `(metric,
   last_level_value)` instead of `(metric, col1_value, col2_value, ...)`, 
yielding an
   inconsistent MultiIndex that either raises during assignment (tuple length 
mismatch) or
   produces malformed column keys, so downstream consumers like `flatten()`
   (`superset/utils/pandas_postprocessing/flatten.py:90-101`) and any logic 
expecting full
   `(metric, col1, col2, ...)` tuples operate on incorrect schema and 
metric/category
   combinations remain missing for the all-NaN metric.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Futils%2Fpandas_postprocessing%2Fpivot.py%0A%2A%2ALine%3A%2A%2A%2045%3A52%0A%2A%2AComment%3A%2A%2A%0A%09%2ALogic%20Error%3A%20MultiIndex%20restoration%20is%20incorrect%20for%20pivots%20with%20more%20than%20one%20%60columns%60%20dimension%3A%20it%20only%20captures%20the%20last%20level%20%28%60get_level_values%28-1%29%60%29%20and%20rebuilds%20missing%20metric%20columns%20as%20%60%28metric%2C%20cat%29%60.%20For%203%2B%20level%20column%20indexes%20this%20drops%20higher-level%20grouping%20keys%2C%20so%20restored%20columns%20do%20not%20match%20the%20original%20pivot%20schema%20and%20downstream%20flatten%2Frename%2Frolling%20references%20can%20still%20fail.%20Rebuild%20missing%20columns%20using%20full%20non-metric%20tuples%20from%20existing%20columns%20%28all%20levels%20after%20level%200%29%2C%20not%20just%20the%20last%20le
 
vel.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Futils%2Fpandas_postprocessing%2Fpivot.py%0A%2A%2ALine%3A%2A%2A%2045%3A52%0A%2A%2AComment%3A%2A%2A%0A%09%2ALogic%20Error%3A%20MultiIndex%20restoration%20is%20incorrect%20for%20pivots%20with%20more%20than%20one%20%60columns%60%20dimension%3A%20it%20only%20captures%20the%2
 
0last%20level%20%28%60get_level_values%28-1%29%60%29%20and%20rebuilds%20missing%20metric%20columns%20as%20%60%28metric%2C%20cat%29%60.%20For%203%2B%20level%20column%20indexes%20this%20drops%20higher-level%20grouping%20keys%2C%20so%20restored%20columns%20do%20not%20match%20the%20original%20pivot%20schema%20and%20downstream%20flatten%2Frename%2Frolling%20references%20can%20still%20fail.%20Rebuild%20missing%20columns%20using%20full%20non-metric%20tuples%20from%20existing%20columns%20%28all%20levels%20after%20level%200%29%2C%20not%20just%20the%20last%20level.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%2
 
0fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/utils/pandas_postprocessing/pivot.py
   **Line:** 45:52
   **Comment:**
        *Logic Error: MultiIndex restoration is incorrect for pivots with more 
than one `columns` dimension: it only captures the last level 
(`get_level_values(-1)`) and rebuilds missing metric columns as `(metric, 
cat)`. For 3+ level column indexes this drops higher-level grouping keys, so 
restored columns do not match the original pivot schema and downstream 
flatten/rename/rolling references can still fail. Rebuild missing columns using 
full non-metric tuples from existing columns (all levels after level 0), not 
just the last level.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40005&comment_hash=0855b54228922302d5feaf7f3130c65eebf867506c9b82ca50d842954dd65711&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40005&comment_hash=0855b54228922302d5feaf7f3130c65eebf867506c9b82ca50d842954dd65711&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to