[jira] [Updated] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

Bryan Pendleton (JIRA) Thu, 22 Jun 2017 19:46:23 -0700

     [ 
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Bryan Pendleton updated DERBY-6940:
-----------------------------------
    Attachment: EOFException.txt
                EOFException_derby.log

I see the EOFException, too, when I run that test, with your DERBY-6940_3.diff  
applied.

I attached the terminal output of my test command, and also the derby.log file, 
so you can look at them in more detail and verify that I'm seeing the same 
behavior that you are seeing.

Looking back over my notes in DERBY-3219, it seems like my concern at the time 
was that the special-case handling for String objects in 
FormatIdOutputStream.writeExternal and FormatIdInputStream.readInternal was 
fragile, because it was ambiguous.

However, Mike made some very good points in DERBY-3219 about how it would be a 
substantial challenge to try to change that behavior at this time.

This is a challenge; it isn't obvious to me what we can do here. It seems like 
the hack that I did for MinMaxAggregator in DERBY-3219 won't work here, because 
the new StatisticsImpl object has both a Min and a Max field added to it, and 
we can't arrange for BOTH of those fields to be the last field, unfortunately 
(at most one of them can be the last one).

However, it does seem like the case of having a string of length 0 be the 
minimum and maximum value for a column will be a quite common special case when 
we compute statistics on an entirely empty table; perhaps we can decree some 
other way of recording those statistics (for example, if numRows is 0, then we 
don't record a string of length 0, instead we record NULL).

That is, trying to restate the idea: does this problem only arise when numRows 
is 0? If so, does that give us a way to construct a workaround?

> Enhance derby statistics for more accurate selectivity estimates.
> -----------------------------------------------------------------
>
>                 Key: DERBY-6940
>                 URL: https://issues.apache.org/jira/browse/DERBY-6940
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Harshvardhan Gupta
>            Assignee: Harshvardhan Gupta
>            Priority: Minor
>         Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff, 
> EOFException_derby.log, EOFException.txt
>
>
> Derby should collect extra statistics during index build time, statistics 
> refresh time which will help optimizer make more precise selectivity 
> estimates and chose better execution paths.
> We eventually want to utilize the new statistics to make better selectivity 
> estimates / cost estimates that will help find the best query plan. Currently 
> Derby keeps two type of stats - the total row count and the number of unique 
> values.
> We are initially extending the stats to include null count, the minimum value 
> and maximum value associated with each of the columns of an index. This would 
> be useful in selectivity estimates for operators such as [ IS NULL, <, <=, >, 
> >= ] , all of which currently rely on hardwired selectivity estimates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

Reply via email to