[
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061206#comment-16061206
]
Harshvardhan Gupta commented on DERBY-6940:
-------------------------------------------
Hi Bryan,
I thought of a workaround and was successful. In particular, I am comparing the
maxVal and minVal and if both are equal I first write an indicator boolean and
then write only one DataValueDescriptor object. In all other cases, I first
write maxVal and then minVal, In this way the problematic object will always be
written last once.
public void writeExternal(ObjectOutput out)
throws IOException
{
FormatableHashtable fh = new FormatableHashtable();
fh.putLong("numRows", numRows);
fh.putLong("numUnique", numUnique);
fh.putLong("nullCount", nullCount);
out.writeObject(fh);
try{
if (maxVal.equals(maxVal, minVal).getBoolean()) {
out.writeBoolean(true);
out.writeObject(minVal);
return;
}
}
catch(StandardException e){
}
finally {
out.writeBoolean(false);
out.writeObject(maxVal);
out.writeObject(minVal);
}
}
public void readExternal(ObjectInput in)
throws IOException, ClassNotFoundException
{
FormatableHashtable fh = (FormatableHashtable)in.readObject();
numRows = fh.getLong("numRows");
numUnique = fh.getLong("numUnique");
nullCount = fh.getLong("nullCount");
if(in.readBoolean()){
maxVal = (DataValueDescriptor)in.readObject();
minVal = maxVal.cloneValue(true);
}
else{
maxVal = (DataValueDescriptor) in.readObject();
minVal = (DataValueDescriptor) in.readObject();
}
}
> Enhance derby statistics for more accurate selectivity estimates.
> -----------------------------------------------------------------
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
> Issue Type: Sub-task
> Components: SQL
> Reporter: Harshvardhan Gupta
> Assignee: Harshvardhan Gupta
> Priority: Minor
> Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff,
> EOFException_derby.log, EOFException.txt
>
>
> Derby should collect extra statistics during index build time, statistics
> refresh time which will help optimizer make more precise selectivity
> estimates and chose better execution paths.
> We eventually want to utilize the new statistics to make better selectivity
> estimates / cost estimates that will help find the best query plan. Currently
> Derby keeps two type of stats - the total row count and the number of unique
> values.
> We are initially extending the stats to include null count, the minimum value
> and maximum value associated with each of the columns of an index. This would
> be useful in selectivity estimates for operators such as [ IS NULL, <, <=, >,
> >= ] , all of which currently rely on hardwired selectivity estimates.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)