EUbaldiEC opened a new pull request, #40738: URL: https://github.com/apache/superset/pull/40738
### SUMMARY When creating heatmaps (both tested in 6.0.0 and 6.1.0) with normalisation across X/Y (and posibly color normalisation), the plot will fail when only one column/row is left by applying dynamic filters. Example is in the flights sample dataset, on a fresh install, putting `AIRLINE` as the X and `CITY` as Y (metric `COUNT(*)` or `SUM(AIRLINE_DELAY)` or anything else). Indeed, the [`pandas_postprocessing/rank.py`](superset/utils/pandas_postprocessing/rank.py) fails when a single category is found in lines/rows. This fix correctly uses the pandas groupby->apply API not to break in that case. A testing unit is added in [test_rank.py](tests/unit_tests/pandas_postprocessing/test_rank.py). Closes #40709 ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF #### BEFORE ### Screenshots/recordings Example is in the flights sample dataset, on a fresh install, putting `AIRLINE` as the X and `CITY` as Y (metric `COUNT(*)` or `SUM(AIRLINE_DELAY)` or anything else). <img width="1492" height="818" alt="Image" src="https://github.com/user-attachments/assets/411bc601-0ec8-4b98-ba63-d8ac208f7936" /> When a filter reducing the data to a single column (like `AIRLINE = 'AA'`), a "Cannot set a DataFrame with multiple columns to the single column rank" error is shown. <img width="1492" height="818" alt="Image" src="https://github.com/user-attachments/assets/95804232-048f-421d-81df-8b862cf994f7" /> An unexptected behavior is also found when reducing both the rows and the columns to 1: - If one normalises across X/Y we get an empty plot (should plot a single square with 100% in both cases) <img width="1486" height="858" alt="Image" src="https://github.com/user-attachments/assets/4fc90c92-8266-4fa5-9093-d32b282460ed" /> #### AFTER - Overall case (same works with normalisation across heatmap / y): <img width="1489" height="838" alt="Image" src="https://github.com/user-attachments/assets/7fb21b04-afab-44f3-9d0d-bb77916f666b" /> - Single column and normalize across X: <img width="1485" height="840" alt="Image" src="https://github.com/user-attachments/assets/a46e19fd-d2c5-4604-a96b-3e72a388f367" /> - Single row and normalize across Y: <img width="1493" height="841" alt="Image" src="https://github.com/user-attachments/assets/6a86e433-40a9-478f-b978-b8c6f187c1da" /> - Single row AND column: <img width="1519" height="839" alt="Image" src="https://github.com/user-attachments/assets/ad808981-2fa0-4970-af25-544cd51b8572" /> ### TESTING INSTRUCTIONS 1. Follow the steps in #40709 to reproduce the error in 6.0.0 and 6.1.0 2. Run the new unit test against the old version of the code for it to raise when the dataframe is reduced to have one category only. 3. New code correctly passes the tests and returns a working heatmap as shown in bottom of #40709 A minimal working example is: ```python tmp_df = categories_df[categories_df["dept"]=="dept0"].reset_index(drop=True) tmp_df.drop(columns=["rank"], errors="ignore", inplace=True) rank_upstream(tmp_df, "asc_idx", "dept") # Prev version --------------------------------------------------------------------------- ValueError Traceback (most recent call last) .... ValueError: Cannot set a DataFrame with multiple columns to the single column rank tmp_df = categories_df[categories_df["dept"]=="dept0"].reset_index(drop=True) tmp_df.drop(columns=["rank"], errors="ignore", inplace=True) rank(tmp_df, "asc_idx", "dept") # NEW version > constant category dept name > asc_idx desc_idx idx_nulls rank > 0 dummy cat0 dept0 person0 0 100 0.0 0.047619 > 1 dummy cat2 dept0 person5 5 95 5.0 0.095238 > 2 dummy cat1 dept0 person10 10 90 10.0 0.142857 ... ``` ### ADDITIONAL INFORMATION - [x] Has associated issue: #40709 - [ ] Required feature flags: - [ ] API changes: - [ ] DB migration required: - [x] CI checks pass - [x] Tests added/updated - [ ] Documentation updated - [x] PR title follows conventions <!--- Describe the change below, including rationale and design decisions --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
