EUbaldiEC opened a new pull request, #40738:
URL: https://github.com/apache/superset/pull/40738

   ### SUMMARY
   
   When creating heatmaps (both tested in 6.0.0 and 6.1.0) with normalisation 
across X/Y (and posibly color normalisation), the plot will fail when only one 
column/row is left by applying dynamic filters.
   
   Example is in the flights sample dataset, on a fresh install, putting 
`AIRLINE` as the X and `CITY` as Y (metric `COUNT(*)` or `SUM(AIRLINE_DELAY)` 
or anything else).
   
   Indeed, the 
[`pandas_postprocessing/rank.py`](superset/utils/pandas_postprocessing/rank.py) 
fails when a single category is found in lines/rows.
   
   This fix correctly uses the pandas groupby->apply API not to break in that 
case.
   
   A testing unit is added in 
[test_rank.py](tests/unit_tests/pandas_postprocessing/test_rank.py).
   
   Closes #40709
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   
   #### BEFORE
   
   ### Screenshots/recordings
   
   Example is in the flights sample dataset, on a fresh install, putting 
`AIRLINE` as the X and `CITY` as Y (metric `COUNT(*)` or `SUM(AIRLINE_DELAY)` 
or anything else).
   
   <img width="1492" height="818" alt="Image" 
src="https://github.com/user-attachments/assets/411bc601-0ec8-4b98-ba63-d8ac208f7936";
 />
   
   When a filter reducing the data to a single column (like `AIRLINE = 'AA'`), 
a "Cannot set a DataFrame with multiple columns to the single column rank" 
error is shown.
   
   <img width="1492" height="818" alt="Image" 
src="https://github.com/user-attachments/assets/95804232-048f-421d-81df-8b862cf994f7";
 />
   
   An unexptected behavior is also found when reducing both the rows and the 
columns to 1:
   
   - If one normalises across X/Y we get an empty plot (should plot a single 
square with 100% in both cases)
   
   <img width="1486" height="858" alt="Image" 
src="https://github.com/user-attachments/assets/4fc90c92-8266-4fa5-9093-d32b282460ed";
 />
   
   #### AFTER
   - Overall case (same works with normalisation across heatmap / y):
   
   <img width="1489" height="838" alt="Image" 
src="https://github.com/user-attachments/assets/7fb21b04-afab-44f3-9d0d-bb77916f666b";
 />
   
   - Single column and normalize across X:
   
   <img width="1485" height="840" alt="Image" 
src="https://github.com/user-attachments/assets/a46e19fd-d2c5-4604-a96b-3e72a388f367";
 />
   
   - Single row and normalize across Y:
   
   <img width="1493" height="841" alt="Image" 
src="https://github.com/user-attachments/assets/6a86e433-40a9-478f-b978-b8c6f187c1da";
 />
   
   - Single row AND column:
   
   <img width="1519" height="839" alt="Image" 
src="https://github.com/user-attachments/assets/ad808981-2fa0-4970-af25-544cd51b8572";
 />
   
   
   
   ### TESTING INSTRUCTIONS
   
   1. Follow the steps in #40709 to reproduce the error in 6.0.0 and 6.1.0
   2. Run the new unit test against the old version of the code for it to raise 
when the dataframe is reduced to have one category only.
   3. New code correctly passes the tests and returns a working heatmap as 
shown in bottom of #40709
   
   A minimal working example is:
   
   ```python
   tmp_df = categories_df[categories_df["dept"]=="dept0"].reset_index(drop=True)
   tmp_df.drop(columns=["rank"], errors="ignore", inplace=True)
   rank_upstream(tmp_df, "asc_idx", "dept")  # Prev version
   ---------------------------------------------------------------------------
   ValueError                                Traceback (most recent call last)
   ....
   ValueError: Cannot set a DataFrame with multiple columns to the single 
column rank
   
   tmp_df = categories_df[categories_df["dept"]=="dept0"].reset_index(drop=True)
   tmp_df.drop(columns=["rank"], errors="ignore", inplace=True)
   rank(tmp_df, "asc_idx", "dept")  # NEW version
   >    constant        category        dept    name    > asc_idx       
desc_idx        idx_nulls       rank
   > 0  dummy   cat0    dept0   person0 0       100     0.0     0.047619
   > 1  dummy   cat2    dept0   person5 5       95      5.0     0.095238
   > 2  dummy   cat1    dept0   person10        10      90      10.0    0.142857
   ...
   ```
   
   ### ADDITIONAL INFORMATION
   - [x] Has associated issue: #40709
   - [ ] Required feature flags:
   - [ ] API changes:
   - [ ] DB migration required:
   
   - [x] CI checks pass
   - [x] Tests added/updated
   - [ ] Documentation updated
   - [x] PR title follows conventions
   
   
   <!--- Describe the change below, including rationale and design decisions -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to