On 2010-12-13 03:28, Robert Haas wrote:
Well, I'm not real familiar with contingency tables, but it seems like
you could end up needing to store a huge amount of data to get any
benefit out of it, in some cases.  For example, in the United States,
there are over 40,000 postal codes, and some even larger number of
city names, and doesn't the number of entries go as O(m*n)?  Now maybe
this is useful enough anyway that we should Just Do It, but it'd be a
lot cooler if we could find a way to give the planner a meaningful
clue out of some more compact representation.
A sparse matrix that holds only 'implicative' (P(A|B) <> P(A*B)?) combinations? Also, some information might be deduced from others. For Heikki's city/region example, for each city it would be known that it is 100% in one region. In that case it suffices to store only that information, since 0% in all other regions ca be deduced. I wouldn't be surprized if storing implicatures like this would reduce the size to O(n).

regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to