[ 
https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16048793#comment-16048793
 ] 

Harshvardhan Gupta commented on DERBY-6938:
-------------------------------------------

Bryan,
Regarding my doubt earlier, one thing that was particularly useful to dive deep 
into the optimizer was to enable optimizer tracing.
https://wiki.apache.org/db-derby/OptimizerTracing

The trace output is quite verbose and helps to understand the various choices 
the optimizer is making.

Few observations and scope of improvements that I would like to point out - 

1) Derby falls back to nested loops more often that we would like to 
particularly in case of large tables, currently the hash table resides entirely 
in memory and derby rules out the HASHJOIN approach if it suspects that it is 
going to be too large (default is 1048576)
Nested loops do not seem to be a good option specially when joining relatively 
large tables (similar to imdb dataset we are using) across more than 4 joins.

It is also documented in the optimizer paper that creating hash tables that 
spill to disk is a potential improvement and my experiments confirm that.

2) Another potential improvement with regards to cardinality estimates. Derby 
currently uses hard wired numbers for every operator other than the equality op 
for selectivity.
https://db.apache.org/derby/docs/10.0/manuals/tuning/perf56.html

In case of equality operator with a known value at compile time, it utilises 
statistics and make selectivity assumptions using number of unique values. I 
think we can enhance the statistics to be able to make better cardinality 
estimates. 


>  Obtain cardinality estimates and true estimates for base tables as well as 
> for intermediate results for queries involving multiple joins. 
> -------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-6938
>                 URL: https://issues.apache.org/jira/browse/DERBY-6938
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Harshvardhan Gupta
>            Assignee: Harshvardhan Gupta
>         Attachments: explain.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to