a2l007 opened a new issue #6057: Broker sends sequential requests to the 
historical for union queries
URL: https://github.com/apache/incubator-druid/issues/6057
 
 
   Analyzing groupBy queries based on Union datasources, we have seen that for 
a given query if a historical has segments for the multiple datasources part of 
this query, the broker generates multiple requests to this historical, but in a 
serial manner.
   For example, running the following union query:
   
   ```
   {
     "queryType": "groupBy",
     "dataSource": {
       "type": "union",
       "dataSources": [
         {
           "type": "table",
           "name": "tableA"
         },
         {
           "type": "table",
           "name": "tableB"
         },
         {
           "type": "table",
           "name": "tableC"
         }
       ]
     },
     "intervals": {
       "type": "LegacySegmentSpec",
       "intervals": [
         "2018-07-23T00:00:00.000Z\/2018-07-24T00:00:00.000Z"
       ]
     },
     "virtualColumns": [
       
     ],
     "filter": null,
     "granularity": "DAY",
     "dimensions": [
       {
         "type": "default",
         "dimension": "acc_id"
       }
     ],
     "aggregations": [
       {
         "fieldName": "clicks",
         "name": "clicks",
         "type": "longSum"
       }
     ],
     "postAggregations": [
       
     ],
     "having": null,
     "limitSpec": {
       "type": "NoopLimitSpec"
     },
     "descending": false
   }
   ```
   generates logs such as:
   ```
   2018-07-24T14:50:06,500 INFO [qtp1256384385-231[groupBy_q_test_1]] 
com.metamx.http.client.pool.ChannelResourceFactory - Generating: https://tier
   .historical.foo.com:4443
   2018-07-24T14:50:40,366 INFO [qtp1256384385-231[groupBy_q_test_1]] 
com.metamx.http.client.pool.ChannelResourceFactory - Generating: https://tier
   .historical.foo.com:4443
   ```
   
   This historical contains segments for datasources `tableA` and `tableB` and 
therefore it sends two request to the historical but as it can be seen from the 
timestamps, there is a delay between both the requests.
    
[BrokerServerView](https://github.com/apache/incubator-druid/blob/master/server/src/main/java/io/druid/client/BrokerServerView.java#L291)
 confirms this behaviour that at an instant for a given query, there can be 
only a single request to a historical. This behaviour clearly causes query 
execution delays for union queries.
   Before I work on investigating if this can be parallelized, I wanted to 
check with the community for any comments on whether this is already a known 
issue with union queries or if it is actually a bug.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to