[
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bryan Pendleton updated DERBY-6940:
-----------------------------------
Attachment: EOFException.txt
EOFException_derby.log
I see the EOFException, too, when I run that test, with your DERBY-6940_3.diff
applied.
I attached the terminal output of my test command, and also the derby.log file,
so you can look at them in more detail and verify that I'm seeing the same
behavior that you are seeing.
Looking back over my notes in DERBY-3219, it seems like my concern at the time
was that the special-case handling for String objects in
FormatIdOutputStream.writeExternal and FormatIdInputStream.readInternal was
fragile, because it was ambiguous.
However, Mike made some very good points in DERBY-3219 about how it would be a
substantial challenge to try to change that behavior at this time.
This is a challenge; it isn't obvious to me what we can do here. It seems like
the hack that I did for MinMaxAggregator in DERBY-3219 won't work here, because
the new StatisticsImpl object has both a Min and a Max field added to it, and
we can't arrange for BOTH of those fields to be the last field, unfortunately
(at most one of them can be the last one).
However, it does seem like the case of having a string of length 0 be the
minimum and maximum value for a column will be a quite common special case when
we compute statistics on an entirely empty table; perhaps we can decree some
other way of recording those statistics (for example, if numRows is 0, then we
don't record a string of length 0, instead we record NULL).
That is, trying to restate the idea: does this problem only arise when numRows
is 0? If so, does that give us a way to construct a workaround?
> Enhance derby statistics for more accurate selectivity estimates.
> -----------------------------------------------------------------
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
> Issue Type: Sub-task
> Components: SQL
> Reporter: Harshvardhan Gupta
> Assignee: Harshvardhan Gupta
> Priority: Minor
> Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff,
> EOFException_derby.log, EOFException.txt
>
>
> Derby should collect extra statistics during index build time, statistics
> refresh time which will help optimizer make more precise selectivity
> estimates and chose better execution paths.
> We eventually want to utilize the new statistics to make better selectivity
> estimates / cost estimates that will help find the best query plan. Currently
> Derby keeps two type of stats - the total row count and the number of unique
> values.
> We are initially extending the stats to include null count, the minimum value
> and maximum value associated with each of the columns of an index. This would
> be useful in selectivity estimates for operators such as [ IS NULL, <, <=, >,
> >= ] , all of which currently rely on hardwired selectivity estimates.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)