> Actually, there's a second large problem with this patch: blindly
> iterating through all combinations of MCV and histogram entries makes the
> runtime O(N^2) in the statistics target. I made up some test data (by
> scanning my mail logs) and observed the following planning times, as
> reported by EXPLAIN ANALYZE:
>
> explain analyze select * from relays r1, relays r2 where r1.ip = r2.ip;
> explain analyze select * from relays r1, relays r2 where r1.ip && r2.ip;
>
> stats target eqjoinsel networkjoinsel
>
> 100 0.27 ms 1.85 ms
> 1000 2.56 ms 167.2 ms
> 10000 56.6 ms 13987.1 ms
>
> I don't think it's necessary for network selectivity to be quite as
> fast as eqjoinsel, but I doubt we can tolerate 14 seconds planning
> time for a query that might need just milliseconds to execute :-(
>
> It seemed to me that it might be possible to reduce the runtime by
> exploiting knowledge about the ordering of the histograms, but
> I don't have time to pursue that right now.
I make it break the loop when we passed the last possible match. Patch
attached. It reduces the runtime almost 50% with large histograms.
We can also make it use only some elements of the MCV and histogram
for join estimation. I have experimented with reducing the right and
the left hand side of the lists on the previous versions. I remember
it was better to reduce only the left hand side. I think it would be
enough to use log(n) elements of the right hand side MCV and histogram.
I can make the change, if you think selectivity estimation function
is the right place for this optimization.
diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c
index f854847..16f39db 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -612,20 +612,23 @@ inet_hist_value_sel(Datum *values, int nvalues, Datum constvalue,
return 0.0;
query = DatumGetInetPP(constvalue);
/* "left" is the left boundary value of the current bucket ... */
left = DatumGetInetPP(values[0]);
left_order = inet_inclusion_cmp(left, query, opr_codenum);
for (i = 1; i < nvalues; i++)
{
+ if (left_order == 256)
+ break;
+
/* ... and "right" is the right boundary value */
right = DatumGetInetPP(values[i]);
right_order = inet_inclusion_cmp(right, query, opr_codenum);
if (left_order == 0 && right_order == 0)
{
/* The whole bucket matches, since both endpoints do. */
match += 1.0;
}
else if ((left_order <= 0 && right_order >= 0) ||
@@ -854,20 +857,23 @@ inet_opr_codenum(Oid operator)
static int
inet_inclusion_cmp(inet *left, inet *right, int opr_codenum)
{
if (ip_family(left) == ip_family(right))
{
int order;
order = bitncmp(ip_addr(left), ip_addr(right),
Min(ip_bits(left), ip_bits(right)));
+ if (order > 0)
+ return 256;
+
if (order != 0)
return order;
return inet_masklen_inclusion_cmp(left, right, opr_codenum);
}
return ip_family(left) - ip_family(right);
}
/*
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers