[GitHub] [incubator-superset] ktmud commented on a change in pull request #10286: feat: add contribution operation and fix cache_key bug

GitBox Fri, 10 Jul 2020 09:24:45 -0700


ktmud commented on a change in pull request #10286:
URL: 
https://github.com/apache/incubator-superset/pull/10286#discussion_r452922238




##########
File path: superset/charts/schemas.py
##########
@@ -395,6 +395,19 @@ class 
ChartDataSortOptionsSchema(ChartDataPostProcessingOperationOptionsSchema):
     aggregates = ChartDataAggregateConfigField()
 
 
+class 
ChartDataContributionOptionsSchema(ChartDataPostProcessingOperationOptionsSchema):
+    """
+    Contribution operation config.
+    """
+
+    orientation = fields.String(
+        description="Should cell values be calculated across the row or 
column.",
+        required=True,
+        validate=validate.OneOf(choices=("row", "column",)),

Review comment:
       Can these two also based on the enum?

##########
File path: superset/utils/pandas_postprocessing.py
##########
@@ -517,3 +518,29 @@ def _parse_location(location: str) -> Tuple[float, float, 
float]:
         return _append_columns(df, geodetic_df, columns)
     except ValueError:
         raise QueryObjectValidationError(_("Invalid geodetic string"))
+
+
+def contribution(
+    df: DataFrame, orientation: PostProcessingContributionOrientation
+) -> DataFrame:
+    """
+    Calculate cell contibution to row/column total.
+
+    :param df: DataFrame containing all-numeric data (temporal column ignored)
+    :param orientation: calculate by dividing cell with row/column total
+    :return: DataFrame with contributions, with temporal column at beginning 
if present
+    """
+    temporal_series: Optional[Series] = None
+    contribution_df = df.copy()
+    if DTTM_ALIAS in df.columns:
+        temporal_series = cast(Series, contribution_df.pop(DTTM_ALIAS))
+
+    if orientation == PostProcessingContributionOrientation.ROW:
+        contribution_dft = contribution_df.T
+        contribution_df = (contribution_dft / contribution_dft.sum()).T
+    else:
+        contribution_df = contribution_df / contribution_df.sum()

Review comment:
       Could probably utilize "axis" parameter for `sum`

##########
File path: superset/utils/pandas_postprocessing.py
##########
@@ -517,3 +518,29 @@ def _parse_location(location: str) -> Tuple[float, float, 
float]:
         return _append_columns(df, geodetic_df, columns)
     except ValueError:
         raise QueryObjectValidationError(_("Invalid geodetic string"))
+
+
+def contribution(
+    df: DataFrame, orientation: PostProcessingContributionOrientation
+) -> DataFrame:

Review comment:
       Maybe it makes more things to keep things simple, but is it possible to 
allow specifying which column(s) to compute on? I could imagine this operator 
being used to support "percent metrics" in table charts and it’ll need the 
ability to select specific columns.
   
   Maybe the API can be something like this
   
   ```
   {
     # compute row wise contribution and override the columns
     "columns": ["foo", "bar"],
     "orientation": "row"
   }
   
   {
     # compute column wise computation and insert new columns
     "columns": ["pct_abc"],
     "sources": ["abc"]
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-superset] ktmud commented on a change in pull request #10286: feat: add contribution operation and fix cache_key bug

Reply via email to