[
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Harshvardhan Gupta updated DERBY-6940:
--------------------------------------
Attachment: DERBY-6940_3.diff
Bryan,
Thanks for the help. I was able to solve my problem by following your
direction. Attached is the new patch.
Regarding the derby statistics, postgres statistics information can be found
out at -
https://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT#Using_ANALYZE_to_optimize_PostgreSQL_queries
https://www.postgresql.org/docs/9.0/static/catalog-pg-statistic.html
Postgres uses null count and distribution buckets, we are currently trying to
do a primitive version of the distribution buckets assuming uniform
distribution between our min values and max values but we can certainly
incorporate that behaviour.
Another interesting thing is the average size of the row written to disk. This
stat could be helpful in cost estimation, I don't see it's relevance in
cardinality estimation though.
> Enhance derby statistics for more accurate selectivity estimates.
> -----------------------------------------------------------------
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
> Issue Type: Sub-task
> Components: SQL
> Reporter: Harshvardhan Gupta
> Assignee: Harshvardhan Gupta
> Priority: Minor
> Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff
>
>
> Derby should collect extra statistics during index build time, statistics
> refresh time which will help optimizer make more precise selectivity
> estimates and chose better execution paths.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)