[jira] [Commented] (CALCITE-1656) Sub-Optimal Druid Query planning - Does not Prune columns for DruidQuery

Nishant Bangarwa (JIRA) Wed, 01 Mar 2017 11:41:12 -0800

    [ 
https://issues.apache.org/jira/browse/CALCITE-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890902#comment-15890902
 ]


Nishant Bangarwa commented on CALCITE-1656:
-------------------------------------------

[~julianhyde] : Fixed above comments, one last pending discussion if about 
including number of fields to be queried as part of cost funtion of DruidQuery. 
I had a discussion with [~jcamachorodriguez] and it seems that DruidQuery may 
not be the best place to adjust cost based on reading more or less columns, It 
should ideally be part of a TableScan instead. But it seems that the existing 
cost model is around number of rows i.e cardinality of rows and not around the 
number of columns which need to be scanned. IMO number of columns being scanned 
is an important measure for Columnar databases and we should maybe also 
consider that for TableScan or have a new ColumnarTableScan that accounts for 
this. Any thoughts on this ? 

I think it may be ok to have the number of fields as part of DruidQuery for now 
until we improve our cost model to include number of columns being scanned 
also. 
Do you agree ?

> Sub-Optimal Druid Query planning - Does not Prune columns for DruidQuery 
> -------------------------------------------------------------------------
>
>                 Key: CALCITE-1656
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1656
>             Project: Calcite
>          Issue Type: Bug
>          Components: druid
>            Reporter: Nishant Bangarwa
>            Assignee: Nishant Bangarwa
>              Labels: performance
>             Fix For: 1.12.0
>
>
> Consider below query - 
> {code}
> select "countryName", floor("time" to DAY), cast(count(*) as integer) as c
>          from "wiki"
>          where floor("time" to DAY) >= '1997-01-01 00:00:00' and          
> floor("time" to DAY) < '1997-09-01 00:00:00'
>          group by "countryName", floor("time" TO DAY)
>          order by c limit 5
> {code} 
> resulting Druid Query - 
> {code}
> {
>   "queryType": "select",
>   "dataSource": "wikiticker",
>   "descending": false,
>   "intervals": [
>     "1900-01-09T00:00:00.000/2992-01-10T00:00:00.000"
>   ],
>   "dimensions": [
>     "channel",
>     "cityName",
>     "comment",
>     "countryIsoCode",
>     "countryName",
>     "isAnonymous",
>     "isMinor",
>     "isNew",
>     "isRobot",
>     "isUnpatrolled",
>     "metroCode",
>     "namespace",
>     "page",
>     "regionIsoCode",
>     "regionName",
>     "user"
>   ],
>   "metrics": [
>     "count",
>     "added",
>     "deleted",
>     "delta",
>     "user_unique"
>   ],
>   "granularity": "all",
>   "pagingSpec": {
>     "threshold": 16384,
>     "fromNext": true
>   },
>   "context": {
>     "druid.query.fetch": false
>   }
> }
> {code} 
> Note that the above druid query has extra dimensions which are not required. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (CALCITE-1656) Sub-Optimal Druid Query planning - Does not Prune columns for DruidQuery

Reply via email to