Questions about the multiplier in cost computation for Druid Query

Junxian Wu Fri, 30 Jun 2017 13:58:44 -0700

Hi Dev Community,
In the middle of the usage of Calcite on Druid, I tried to run some simple 
query like "select * from table_name limit 100". Although the SELECT query in 
druid is super inefficient but by limiting the output row, it can still return 
the result fast.
However now the cost computation could make the planner tend to handle the 
LIMIT RelNode (fetch() in sort node) in the memory of Calcite JVM. In my case, 
When the number of LIMIT is larger than 7, the limit will not get pushed in and 
since the total amount of data is large and will make the JVM out of memory. By 
changing the multiplier of cost when sort node get pushed in to a small number, 
a larger LIMIT can be pushed in. This logic does not seem correct because when 
the LIMIT is fetching more row, we should tend to handle it in database(Druid) 
side instead of in memory. Should we redesign the cost computation so it will 
have the correct logic?
Thank you.

Questions about the multiplier in cost computation for Druid Query

Reply via email to