Re: [HACKERS] proposal : cross-column stats

Tomas Vondra Mon, 13 Dec 2010 11:23:06 -0800

Dne 13.12.2010 18:59, Joshua Tolley napsal(a):
> On Sun, Dec 12, 2010 at 07:10:44PM -0800, Nathan Boley wrote:
>> Another quick note: I think that storing the full contingency table is
>> wasteful since the marginals are already stored in the single column
>> statistics. Look at copulas [2] ( FWIW I think that Josh Tolley was
>> looking at this a couple years back ).
> 
> Josh Tolley still looks at it occasionally, though time hasn't permitted any
> sort of significant work for quite some time. The multicolstat branch on my
> git.postgresql.org repository will create an empirical copula each
> multi-column index, and stick it in pg_statistic. It doesn't yet do anything
> useful with that information, nor am I convinced it's remotely bug-free. In a
> brief PGCon discussion with Tom a while back, it was suggested a good place
> for the planner to use these stats would be clausesel.c, which is responsible
> for handling code such as "...WHERE foo > 4 AND foo > 5".


Well, that's good news ;-)

I've done a bit of research today, and I've found some quite interesting
papers on this topic (probably, I did not have time to read them, in
most cases I've read just the title and abstract).

[1] Selectivity estimators for multidimensional range queries over real
    attributes
    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.122.914

    This seems like a very good starting point. AFAIK it precisely
    describes what data need to be collected, how to do the math etc.

[2] Selectivity Estimation Without the Attribute Value Independence
    Assumption
    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.105.8126

    This obviously deals with the independence problem. Haven't
    investigated it further, but it seems worth to read.

[3] On Analyzing the Cost of Queries with Multi-Attribute Restrictions
    and Sort Operations (A Cost Function for Uniformly Partitioned
    UB-Trees)
    http://mistral.in.tum.de/results/publications/MB00_ideas.pdf

    Describes something called UB-Tree, and shows how it may be used to
    do estimates. Might be interesting as an alternative to the
    traditional histograms.

    There are more details about UB-Trees at

    http://mistral.in.tum.de/results/publications/

[4] http://www.dbnet.ece.ntua.gr/~nikos/edith/qopt_bibl/

    A rather nice collection of papers related to estimation (including
    some of the papers listed above).

Hm, I planned to finally read the "Understanding MySQL Internals" over
the Xmas ... that obviously won't happen.

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal : cross-column stats

Reply via email to