[
https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16048793#comment-16048793
]
Harshvardhan Gupta commented on DERBY-6938:
-------------------------------------------
Bryan,
Regarding my doubt earlier, one thing that was particularly useful to dive deep
into the optimizer was to enable optimizer tracing.
https://wiki.apache.org/db-derby/OptimizerTracing
The trace output is quite verbose and helps to understand the various choices
the optimizer is making.
Few observations and scope of improvements that I would like to point out -
1) Derby falls back to nested loops more often that we would like to
particularly in case of large tables, currently the hash table resides entirely
in memory and derby rules out the HASHJOIN approach if it suspects that it is
going to be too large (default is 1048576)
Nested loops do not seem to be a good option specially when joining relatively
large tables (similar to imdb dataset we are using) across more than 4 joins.
It is also documented in the optimizer paper that creating hash tables that
spill to disk is a potential improvement and my experiments confirm that.
2) Another potential improvement with regards to cardinality estimates. Derby
currently uses hard wired numbers for every operator other than the equality op
for selectivity.
https://db.apache.org/derby/docs/10.0/manuals/tuning/perf56.html
In case of equality operator with a known value at compile time, it utilises
statistics and make selectivity assumptions using number of unique values. I
think we can enhance the statistics to be able to make better cardinality
estimates.
> Obtain cardinality estimates and true estimates for base tables as well as
> for intermediate results for queries involving multiple joins.
> -------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: DERBY-6938
> URL: https://issues.apache.org/jira/browse/DERBY-6938
> Project: Derby
> Issue Type: Sub-task
> Components: SQL
> Reporter: Harshvardhan Gupta
> Assignee: Harshvardhan Gupta
> Attachments: explain.txt
>
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)