[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.

Harshvardhan Gupta (JIRA) Wed, 28 Jun 2017 12:40:26 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067113#comment-16067113
 ]


Harshvardhan Gupta commented on DERBY-6942:
-------------------------------------------

The estimateCost function in FromBaseTable.java is the guts of the cost and 
cardinality estimation logic for base table predicates.

The function is called for each of conglomerate (i.e each of the access path 
possible on the table such as indexes, full table scan). For predicates that 
can be utilized as start or stop predicates for the given conglomerate (refer 
http://db.apache.org/derby/docs/10.13/tuning/ctunoptimz24840.html), row 
estimates are obtained through the store by giving the actual constant keys 
specified in the query, the estimate by row is quite accurate as it finds out 
the fraction of values between the start and stop keys.

For all other remaining predicates one of the two things happen:

1) If an equality op is specified and statistics exist for the conglomerate 
(and it is not a valid start/stop predicate), we query statistics for 
selectivity.(it takes into account number of unique columns)

2) Hard wired selectivities are used for all other cases.


As part of DERBY-6940, we started collecting extra statistics such as null 
count, min and max value for each of the columns of the index. We would like to 
utilise them to eliminate usage of hard wired selectivities whenever possible.

For example - 

Let us say we have a table and an associated index created using the following 
statement - 

create table abcd(id integer);
create index idx on abcd(id);

Now, suppose a query is issued - 

select * from abcd where id is NULL;

There are two possible access paths for the query and both are considered by 
the query optimizer - 

1) Index Scan - The store is able to give away fairly accurate row estimates as 
'NULL' is a valid start as well as stop key.

2) Table scan - Here, hard wired selectivity estimates are used for row 
estimates as discussed above.

> Utilise additional statistics for selectivity estimates.
> --------------------------------------------------------
>
>                 Key: DERBY-6942
>                 URL: https://issues.apache.org/jira/browse/DERBY-6942
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Harshvardhan Gupta
>            Assignee: Harshvardhan Gupta
>            Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.

Reply via email to