villebro opened a new pull request #9427: feat: Add post processing to 
QueryObject
URL: https://github.com/apache/incubator-superset/pull/9427
 
 
   ### CATEGORY
   
   Choose one
   
   - [ ] Bug Fix
   - [x] Enhancement (new features, refinement)
   - [ ] Refactor
   - [x] Add tests
   - [ ] Build / Development Environment
   - [ ] Documentation
   
   ### SUMMARY
   Currently the `/api/v1/query` endpoint doesn't support post-SQL data 
processing. This functionality is necessary for decoupling the backend from the 
frontend, as many of the data operations necessary for advanced visualizations 
often require data processing either not readily available in the JavaScript 
ecosystem, or are unfeasible due to network/computational expense.
   
   This PR adds post-query data processing functionality to Superset necessary 
for deprecating `viz.py`, namely
   - `aggregate` (same as SQL `GROUP BY`)
   - `pivot` (grouping by into column values and aggregation by cell value)
   - `sort` (same as `ORDER BY`)
   - `rolling` (e.g. moving sums, averages)
   
   This is done by leveraging functionality readily available in Pandas and 
Numpy. To leverage this functionality, post processing operations can be 
defined as part of the `queries` attribute in the `QueryContext` object. Below 
is an example from the unit tests, where the mean and 1st quantile are computed 
on an already aggregated query, which is lastly sorted in descending order by 
the 1st quantile value:
   
   ```python
   {
       "queries": [
           {
               "granularity": "ds",
               "groupby": ["name", "state"],
               "metrics": [{"label": "sum__num"}],
               "filters": [],
               "row_limit": 100,
               "post_processing": [
                   {
                       "operation": "aggregate",
                       "options": {
                           "groupby": ["state"],
                           "aggregates": {
                               "q1": {
                                   "operator": "percentile",
                                   "column": "sum__num",
                                   "options": {"q": 25},
                               },
                               "median": {
                                   "operator": "median",
                                   "column": "sum__num",
                               },
                           },
                       },
                   },
                   {
                       "operation": "sort",
                       "options": {
                           "by": ["q1", "state"],
                           "ascending": {"q1": False},
                       },
                   },
               ],
           }
       ],
   }
   ```
   
   This feature should be seen as experimental at this stage. Furthermore, 
documentation will be added later, probably in the form of OpenAPI specs.
   
   ### TEST PLAN
   CI + local tests
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue: #9187
   - [ ] Changes UI
   - [ ] Requires DB Migration.
   - [ ] Confirm DB Migration upgrade and downgrade tested.
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   
   ### REVIEWERS
   @rusackas @suddjian @kristw @john-bodley @etr2460 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to