villebro commented on a change in pull request #18782:
URL: https://github.com/apache/superset/pull/18782#discussion_r809763411
##########
File path: superset/utils/pandas_postprocessing/contribution.py
##########
@@ -71,5 +73,7 @@ def contribution(
numeric_df = numeric_df[columns]
axis = 0 if orientation == PostProcessingContributionOrientation.COLUMN
else 1
numeric_df = numeric_df / numeric_df.values.sum(axis=axis, keepdims=True)
+ # replace infinity and nan with 0 in dataframe
+ numeric_df.replace(to_replace=[np.Inf, -np.Inf, np.nan], value=0,
inplace=True)
Review comment:
Thanks for the explanation! This is mostly a nit, but please hear me out
😆 I agree that we need to fill nulls with zeros before doing the contribution
calculation, but strictly mathematically speaking, after we've done the
contribution calculation, I think leaving the infinite values as null seems
more appropriate. In the example, I feel this should be the correct result for
ROW level contribution, as there strictly speaking isn't anything to contribute
to:
```
__timestamp a b c
0 2020-07-16 14:49:00 0.50 0.50 0.0
1 2020-07-16 14:50:00 0.25 0.75 0.0
2 2020-07-16 14:51:00 NaN NaN NaN
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]