Re: Questions about the multiplier in cost computation for Druid Query

JD Zheng Sun, 02 Jul 2017 09:25:45 -0700

Julian, I think it’s good idea to always push down the limit. But regarding to 
Jungian’s case, there’s a simpler fix. This is a bug in the computing of cost 
that I have already reported: https://issues.apache.org/jira/browse/CALCITE-1842
The simple fix is to switch the two parameters. That will solve the problem and 
push the limit down.


-JD

> On Jun 30, 2017, at 6:29 PM, Julian Hyde <[email protected]> wrote:
> 
> Is it true that we'd always want to push the limit down to Druid,
> regardless of whether the limit is large or small? If so, and if this
> is not happening, there is a bug in the cost model; please log it.
> 
> On Fri, Jun 30, 2017 at 1:56 PM, Junxian Wu
> <[email protected]> wrote:
>> Hi Dev Community,
>> In the middle of the usage of Calcite on Druid, I tried to run some simple 
>> query like "select * from table_name limit 100". Although the SELECT query 
>> in druid is super inefficient but by limiting the output row, it can still 
>> return the result fast.
>> However now the cost computation could make the planner tend to handle the 
>> LIMIT RelNode (fetch() in sort node) in the memory of Calcite JVM. In my 
>> case, When the number of LIMIT is larger than 7, the limit will not get 
>> pushed in and since the total amount of data is large and will make the JVM 
>> out of memory. By changing the multiplier of cost when sort node get pushed 
>> in to a small number, a larger LIMIT can be pushed in. This logic does not 
>> seem correct because when the LIMIT is fetching more row, we should tend to 
>> handle it in database(Druid) side instead of in memory. Should we redesign 
>> the cost computation so it will have the correct logic?
>> Thank you.

Re: Questions about the multiplier in cost computation for Druid Query

Reply via email to