[
https://issues.apache.org/jira/browse/CALCITE-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890902#comment-15890902
]
Nishant Bangarwa commented on CALCITE-1656:
-------------------------------------------
[~julianhyde] : Fixed above comments, one last pending discussion if about
including number of fields to be queried as part of cost funtion of DruidQuery.
I had a discussion with [~jcamachorodriguez] and it seems that DruidQuery may
not be the best place to adjust cost based on reading more or less columns, It
should ideally be part of a TableScan instead. But it seems that the existing
cost model is around number of rows i.e cardinality of rows and not around the
number of columns which need to be scanned. IMO number of columns being scanned
is an important measure for Columnar databases and we should maybe also
consider that for TableScan or have a new ColumnarTableScan that accounts for
this. Any thoughts on this ?
I think it may be ok to have the number of fields as part of DruidQuery for now
until we improve our cost model to include number of columns being scanned
also.
Do you agree ?
> Sub-Optimal Druid Query planning - Does not Prune columns for DruidQuery
> -------------------------------------------------------------------------
>
> Key: CALCITE-1656
> URL: https://issues.apache.org/jira/browse/CALCITE-1656
> Project: Calcite
> Issue Type: Bug
> Components: druid
> Reporter: Nishant Bangarwa
> Assignee: Nishant Bangarwa
> Labels: performance
> Fix For: 1.12.0
>
>
> Consider below query -
> {code}
> select "countryName", floor("time" to DAY), cast(count(*) as integer) as c
> from "wiki"
> where floor("time" to DAY) >= '1997-01-01 00:00:00' and
> floor("time" to DAY) < '1997-09-01 00:00:00'
> group by "countryName", floor("time" TO DAY)
> order by c limit 5
> {code}
> resulting Druid Query -
> {code}
> {
> "queryType": "select",
> "dataSource": "wikiticker",
> "descending": false,
> "intervals": [
> "1900-01-09T00:00:00.000/2992-01-10T00:00:00.000"
> ],
> "dimensions": [
> "channel",
> "cityName",
> "comment",
> "countryIsoCode",
> "countryName",
> "isAnonymous",
> "isMinor",
> "isNew",
> "isRobot",
> "isUnpatrolled",
> "metroCode",
> "namespace",
> "page",
> "regionIsoCode",
> "regionName",
> "user"
> ],
> "metrics": [
> "count",
> "added",
> "deleted",
> "delta",
> "user_unique"
> ],
> "granularity": "all",
> "pagingSpec": {
> "threshold": 16384,
> "fromNext": true
> },
> "context": {
> "druid.query.fetch": false
> }
> }
> {code}
> Note that the above druid query has extra dimensions which are not required.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)