Hi Dev Community,
In the middle of the usage of Calcite on Druid, I tried to run some simple
query like "select * from table_name limit 100". Although the SELECT query in
druid is super inefficient but by limiting the output row, it can still return
the result fast.
However now the cost computation could make the planner tend to handle the
LIMIT RelNode (fetch() in sort node) in the memory of Calcite JVM. In my case,
When the number of LIMIT is larger than 7, the limit will not get pushed in and
since the total amount of data is large and will make the JVM out of memory. By
changing the multiplier of cost when sort node get pushed in to a small number,
a larger LIMIT can be pushed in. This logic does not seem correct because when
the LIMIT is fetching more row, we should tend to handle it in database(Druid)
side instead of in memory. Should we redesign the cost computation so it will
have the correct logic?
Thank you.