[ 
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6940:
--------------------------------------
    Attachment: DERBY-6940_3.diff

Bryan,

Thanks for the help. I was able to solve my problem by following your 
direction. Attached is the new patch. 

Regarding the derby statistics, postgres statistics information can be found 
out at - 

https://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT#Using_ANALYZE_to_optimize_PostgreSQL_queries

https://www.postgresql.org/docs/9.0/static/catalog-pg-statistic.html

Postgres uses null count and distribution buckets, we are currently trying to 
do a primitive version of the distribution buckets assuming uniform 
distribution between our min values and max values but we can certainly 
incorporate that behaviour.

Another interesting thing is the average size of the row written to disk. This 
stat could be helpful in cost estimation, I don't see it's relevance in 
cardinality estimation though.

> Enhance derby statistics for more accurate selectivity estimates.
> -----------------------------------------------------------------
>
>                 Key: DERBY-6940
>                 URL: https://issues.apache.org/jira/browse/DERBY-6940
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Harshvardhan Gupta
>            Assignee: Harshvardhan Gupta
>            Priority: Minor
>         Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff
>
>
> Derby should collect extra statistics during index build time, statistics 
> refresh time which will help optimizer make more precise selectivity 
> estimates and chose better execution paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to