Jiarong Wei created CALCITE-1428:
------------------------------------
Summary: Inefficient execution plan of SELECT and LIMIT for Druid
Key: CALCITE-1428
URL: https://issues.apache.org/jira/browse/CALCITE-1428
Project: Calcite
Issue Type: Bug
Components: core, druid
Reporter: Jiarong Wei
Assignee: Julian Hyde
For SQLs like:
1. {{SELECT * FROM <table> LIMIT <row_count>}}
2. {{SELECT <all_columns_specified_explicitly> FROM <table> LIMIT <row_count>}}
{{DruidSortRule}} in Druid adapter does take effect and {{LIMIT}} is pushed
into {{DruidQuery}}. However the corresponding execution plan isn't chosen as
the best one. Thus Calcite will retrieve all data from Druid and purge all
unnecessary columns.
These are three SQLs and their corresponding execution plans below for dataset
{{wikiticker}} in Druid quickstart:
1. {{SELECT "cityName" FROM "wikiticker" LIMIT 5}}
{code}
rel#27:EnumerableInterpreter.ENUMERABLE.[](input=rel#26:Subset#2.BINDABLE.[])
rel#85:DruidQuery.BINDABLE.[](table=[default,
wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z],projects=[$3],fetch=5)
{code}
2. {{SELECT * FROM "wikiticker" LIMIT 5}}
{code}
rel#52:EnumerableLimit.ENUMERABLE.[](input=rel#36:Subset#0.ENUMERABLE.[],fetch=5)
rel#79:EnumerableInterpreter.ENUMERABLE.[](input=rel#4:Subset#0.BINDABLE.[])
rel#1:DruidQuery.BINDABLE.[](table=[default,
wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z])
{code}
3. {{SELECT "__time", "added", "channel", "cityName", "comment",
"commentLength", "count", "countryIsoCode", "countryName", "deleted", "delta",
"deltaBucket", "diffUrl", "flags", "isAnonymous", "isMinor", "isNew",
"isRobot", "isUnpatrolled", "metroCode", "namespace", "page", "regionIsoCode",
"regionName", "user", "user_unique" FROM "wikiticker" LIMIT 5}}
{code}
rel#42:EnumerableLimit.ENUMERABLE.[](input=rel#41:Subset#1.ENUMERABLE.[],fetch=5)
rel#113:EnumerableInterpreter.ENUMERABLE.[](input=rel#34:Subset#1.BINDABLE.[])
rel#52:BindableProject.BINDABLE.[](input=rel#4:Subset#0.BINDABLE.[],__time=$0,added=$1,channel=$2,cityName=$3,comment=$4,commentLength=$5,count=$6,countryIsoCode=$7,countryName=$8,deleted=$9,delta=$10,deltaBucket=$11,diffUrl=$12,flags=$13,isAnonymous=$14,isMinor=$15,isNew=$16,isRobot=$17,isUnpatrolled=$18,metroCode=$19,namespace=$20,page=$21,regionIsoCode=$22,regionName=$23,user=USER,user_unique=$25)
rel#1:DruidQuery.BINDABLE.[](table=[default,
wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z])
{code}
Notice that 2 and 3 should have {{LIMIT}} pushed to {{DruidQuery}} like 1 (and
should not have {{EnumerableLimit}})
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)