Re: [HACKERS] proposal : cross-column stats

Yeb Havinga Mon, 13 Dec 2010 01:51:57 -0800

On 2010-12-13 03:28, Robert Haas wrote:

Well, I'm not real familiar with contingency tables, but it seems like
you could end up needing to store a huge amount of data to get any
benefit out of it, in some cases.  For example, in the United States,
there are over 40,000 postal codes, and some even larger number of
city names, and doesn't the number of entries go as O(m*n)?  Now maybe
this is useful enough anyway that we should Just Do It, but it'd be a
lot cooler if we could find a way to give the planner a meaningful
clue out of some more compact representation.

A sparse matrix that holds only 'implicative' (P(A|B) <> P(A*B)?)combinations? Also, some information might be deduced from others. ForHeikki's city/region example, for each city it would be known that it is100% in one region. In that case it suffices to store only thatinformation, since 0% in all other regions ca be deduced. I wouldn't besurprized if storing implicatures like this would reduce the size to O(n).


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal : cross-column stats

Reply via email to