[GitHub] [superset] tijoparacka opened a new issue, #22438: Filtered aggregation to filter + aggregation while querying Apache Druid

GitBox Fri, 16 Dec 2022 03:28:29 -0800


tijoparacka opened a new issue, #22438:
URL: https://github.com/apache/superset/issues/22438


   Currently, Superset creates the query using Filtered Aggregation while 
creating the Druid native query.  Filtered aggregation on the String dimension 
is not optimal due to performance reasons while applied in a big dataset.  
   
   This request is to add the filter used in the filtered aggregation to the 
query itself.
   
   Eg: Aggregation query  
   
   {
     "queryType": "timeseries",
     "dataSource": {
       "type": "table",
       "name": "wikipedia"
     },
     "intervals": {
       "type": "intervals",
       "intervals": [
         "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
       ]
     },
     "granularity": {
       "type": "all"
     },
     "aggregations": [
       {
         "type": "filtered",
         "aggregator": {
           "type": "longSum",
           "name": "en_cnt",
           "fieldName": "added"
         },
         "filter": {
           "type": "selector",
           "dimension": "channel",
           "value": "#en.wikipedia"
         }
       },
       {
         "type": "filtered",
         "aggregator": {
           "type": "longSum",
           "name": "ar_cnt",
           "fieldName": "added"
         },
         "filter": {
           "type": "selector",
           "dimension": "channel",
           "value": "#ar.wikipedia"
         }
       }
     ]
   }
   
   This will scan all the segments within the interval.  
   
   To improve the performance we need to add  the filters in the Query filter 
   
   eg:
   {
     "queryType": "timeseries",
     "dataSource": {
       "type": "table",
       "name": "wikipedia"
     },
     "intervals": {
       "type": "intervals",
       "intervals": [
         "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
       ]
     },
     "filter": {
       "type": "in",
       "dimension": "channel",
       "values": [
         "#ar.wikipedia",
         "#en.wikipedia"
       ]
     },
     "granularity": {
       "type": "all"
     },
     "aggregations": [
       {
         "type": "filtered",
         "aggregator": {
           "type": "longSum",
           "name": "a0",
           "fieldName": "added"
         },
         "filter": {
           "type": "selector",
           "dimension": "channel",
           "value": "#en.wikipedia"
         },
         "name": "a0"
       },
       {
         "type": "filtered",
         "aggregator": {
           "type": "longSum",
           "name": "a1",
           "fieldName": "added"
         },
         "filter": {
           "type": "selector",
           "dimension": "channel",
           "value": "#ar.wikipedia"
         },
         "name": "a1"
       }
     ]
   }
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [superset] tijoparacka opened a new issue, #22438: Filtered aggregation to filter + aggregation while querying Apache Druid

Reply via email to