Hi Amit and all,

On 18.05.2016 04:26, Amit Langote wrote:
On 2016/05/18 2:22, Tom Lane wrote:
The two ways that we've dealt with this type of hazard are to copy data
out of the relcache before using it; or to give the relcache the
responsibility of not moving a particular portion of data if it did not
change.  From memory, the latter applies to the tuple descriptor and
trigger data, but we've done most other things the first way.
It seems that tuple descriptor is reference-counted; however trigger data
is copied.  The former seems to have been done on performance grounds (I
found 06e10abc).

So for a performance-sensitive relcache data structure, refcounting is the
way to go (although done quite rarely)?  In this particular case, it is a
"partition descriptor" that could get big for a partitioned table with
partitions in hundreds or thousands.

Thanks,
Amit

Here is an experimental patch that optimizes planning time for range partitioned tables (it could be considered as a "proof of concept"). Patch should be applied on top of Amit's declarative partitioning patch. It handles only a very special case (often used though) where partitioning key consists of just a single attribute and doesn't contain expressions.

The main idea is the following:
* we are looking for clauses like 'VAR OP CONST' (where VAR is partitioning key attribute, OP is a comparison operator);
* using binary search find a partition (X) that fits CONST value;
* based on OP operator determine which partitions are also covered by clause. There are possible cases: 1. If OP is '<' or '<=' then we need partitions standing left from X (including) 2. If OP is '>' or '>=' then we need partitions standing right from X (including)
   3. If OP is '=' the we need only X partition
(for '<' and '>' operators we also check if CONST value is equal to a lower or upper boundary (accordingly) and if it's true then exclude X).

For boolean expressions we evaluate left and right sides accordingly to algorithm above and then based on boolean operator find intersection (for AND) or union (for OR).

I run some benchmarks on:
1. original constraint exclusion mechanism,
2. optimized version (this patch) and
3. optimized version using relation->rd_partdesc pointer instead of RelationGetPartitionDesc() function (see previous discussion).

Initial conditions:

CREATE TABLE abc (id SERIAL NOT NULL, a INT, dt TIMESTAMP) PARTITION BY RANGE (a);
CREATE TABLE abc_1 PARTITION OF abc FOR VALUES START (0) END (1000);
CREATE TABLE abc_2 PARTITION OF abc FOR VALUES START (1000) END (2000);
...
etc
INSERT INTO %s (a) SELECT generate_series(0, <partitions_count> * 1000);

pgbench scripts: https://gist.github.com/zilder/872e634a8eeb405bd045465fc9527e53 (where :partitions is a number of partitions). The first script tests fetching a single row from the partitioned table. Results (tps):

# of partitions | constraint excl. | optimized | optimized (using pointer)
----------------+------------------+---------------+----------------------------
            100 |              658 |          2906 | 3079
           1000 |               45 |          2174 | 3021
           2000 |               22 |          1667 | 2919


The second script tests fetching all data from a single partition. Results (tps):

# of partitions | constraint excl. | optimized | optimized (using pointer)
----------------+------------------+---------------+----------------------------
            100 |              317 |          1001 | 1051
           1000 |               34 |           941 | 1023
           2000 |               15 |           813 | 1016

Optimized version works much faster on large amount of partitions and degradates slower than constraint exclusion. But still there is a noticeable performance degradation from copying PartitionDesc structure: with 2000 partitions RelationGetPartitionDesc() function spent more than 40% of all execution time on copying in first benchmark (measured with `perf`). Using reference counting as Amit suggests will allow to significantily decrease performance degradation.

Any comments and suggestions are very welcome. Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index abec2b8..cd4c727 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -47,6 +47,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
+// #include "utils/rangeset.h"
 
 /*
  * Information about a single partition
@@ -77,6 +78,29 @@ typedef struct PartitionDescData
 	PartitionRangeBound **rangeuppers;	/* range upper bounds */
 } PartitionDescData;
 
+/*
+ * Context
+ */
+typedef struct walk_expr_tree_context
+{
+	Oid				parentId;
+	Index			index;
+	PartitionKey	key;
+	PartitionDesc	desc;
+} walk_expr_tree_context;
+
+typedef struct var_const_pair
+{
+	Var		*pvar;
+	Const	*pconst;
+} var_const_pair;
+
+typedef struct IndexRange
+{
+	uint32 lower;
+	uint32 upper;
+} IndexRange;
+
 static PartitionKey CopyPartitionKey(PartitionKey fromkey);
 static KeyTypeCollInfo *copy_key_type_coll_info(int nkeycols,
 								KeyTypeCollInfo *tcinfo);
@@ -142,10 +166,27 @@ static int list_partition_for_tuple(PartitionKey key, PartitionDesc pdesc,
 static int range_partition_for_tuple(PartitionKey key, PartitionDesc pdesc,
 							Datum *values);
 static int range_partition_bsearch(PartitionKey key, PartitionDesc pdesc,
-						Datum *values);
+						Datum *values, int *nearest_pos);
 static bool rightof(PartitionKey key, Datum *values, PartitionRangeBound *bound);
 static bool leftof(PartitionKey key, Datum *values, PartitionRangeBound *bound);
 
+static List *make_all_partitions_rangeset(PartitionDesc desc);
+static bool lock_partition(Oid part_relid, LOCKMODE lockmode);
+static List *get_partitions_for_expr(const Expr *expr, walk_expr_tree_context *ctx);
+static List *get_partitions_for_boolexpr(const BoolExpr *expr, walk_expr_tree_context *ctx);
+static List *get_partitions_for_opexpr(const OpExpr *expr, walk_expr_tree_context *ctx);
+static List *get_range_partitions_for_opexpr(const OpExpr *expr, walk_expr_tree_context *ctx);
+static List *get_range_partitions_for_arropexpr(const ScalarArrayOpExpr *expr, walk_expr_tree_context *ctx);
+static bool pull_var_const(const OpExpr *expr, var_const_pair *pair);
+
+static IndexRange *make_range(uint32 lower, uint32 upper);
+static bool ranges_intersect(const IndexRange *a, const IndexRange *b);
+static bool ranges_conjuncted(const IndexRange *a, const IndexRange *b);
+static IndexRange *make_range_intersection(const IndexRange *a, const IndexRange *b);
+static IndexRange *make_range_union(const IndexRange *a, const IndexRange *b);
+static List *rangeset_union(List *a, List *b);
+static List *rangeset_intersection(List *a, List *b);
+
 /*
  * StorePartitionKey
  *		Store the partition keys of rel into pg_partitioned catalog
@@ -2230,7 +2271,7 @@ range_partition_for_tuple(PartitionKey key, PartitionDesc pdesc, Datum *values)
 {
 	Assert(pdesc->nparts > 0);
 
-	return range_partition_bsearch(key, pdesc, values);
+	return range_partition_bsearch(key, pdesc, values, NULL);
 }
 
 /*
@@ -2239,16 +2280,17 @@ range_partition_for_tuple(PartitionKey key, PartitionDesc pdesc, Datum *values)
  */
 static int
 range_partition_bsearch(PartitionKey key, PartitionDesc pdesc,
-						Datum *values)
+						Datum *values, int *nearest_pos)
 {
 	int		low, high;
+	int		idx = -1;
 
 	/* Good ol' bsearch */
 	low = 0;
 	high = pdesc->nparts - 1;
 	while (low <= high)
 	{
-		int		idx = (low + high) / 2;
+		idx = (low + high) / 2;
 
 		if (pdesc->rangeuppers[idx]->infinite)
 		{
@@ -2272,6 +2314,9 @@ range_partition_bsearch(PartitionKey key, PartitionDesc pdesc,
 		low = idx + 1;
 	}
 
+	if (nearest_pos)
+		*nearest_pos = idx;
+
 	return -1;
 }
 
@@ -2298,3 +2343,603 @@ leftof(PartitionKey key, Datum *values, PartitionRangeBound *bound)
 
 	return cmpval < 0;
 }
+
+/*
+ * get_partitions_by_quals
+ *		Returns partitions OIDs according to qual clauses
+ */
+List *
+get_partitions_by_quals(Expr *quals, Oid relationId, Index rti, LOCKMODE lockmode)
+{
+	ListCell *lc;
+	walk_expr_tree_context ctx;
+	Relation relation;
+	List *partitions = NIL;
+	List *ranges;
+	int i;
+
+	relation = RelationIdGetRelation(relationId);
+	ctx.parentId = relationId;
+	ctx.index = rti;
+	ctx.key = relation->rd_partkey;
+	ctx.desc = RelationGetPartitionDesc(relation);
+
+	/* If there is an empty quals list then just return all partitions */
+	if (!quals)
+	{
+		ranges = list_make1(make_range(0, ctx.desc->nparts - 1));
+		goto build_partitions_list;
+	}
+
+	/*
+	 * TODO: at the moment we do not determine partitions by quals for
+	 * multicolumn partitioning and partitioning by expressions
+	 */
+	if (ctx.key->partnatts > 1 || list_length(ctx.key->partexprs) > 0)
+	{
+		ranges = make_all_partitions_rangeset(ctx.desc);
+		goto build_partitions_list;
+	}
+
+	ranges = get_partitions_for_expr(quals, &ctx);
+
+build_partitions_list:
+	/*
+	 * Acquire locks and build the result list.
+	 */
+	partitions = lappend_oid(partitions, relationId);
+
+	/* Transform rangeset to actual OIDs and lock partitions */
+	foreach (lc, ranges)
+	{
+		IndexRange *range = (IndexRange *) lfirst(lc);
+
+		for (i=range->lower; i <= range->upper; i++)
+			if (lock_partition(ctx.desc->oids[i], lockmode))
+				partitions = lappend_oid(partitions, ctx.desc->oids[i]);
+	}
+
+	RelationClose(relation);
+
+	return partitions;
+}
+
+static List *
+make_all_partitions_rangeset(PartitionDesc desc)
+{
+	return list_make1(make_range(0, desc->nparts - 1));
+}
+
+/*
+ * lock_partition
+ *		Set lock on partition
+ */
+static bool
+lock_partition(Oid part_relid, LOCKMODE lockmode)
+{
+	if (lockmode != NoLock)
+	{
+		/* Get the lock to synchronize against concurrent drop */
+		LockRelationOid(part_relid, lockmode);
+
+		/*
+		 * Now that we have the lock, double-check to see if the relation
+		 * really exists or not.  If not, assume it was dropped while we
+		 * waited to acquire lock, and ignore it.
+		 */
+		if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(part_relid)))
+		{
+			/* Release useless lock */
+			UnlockRelationOid(part_relid, lockmode);
+			/* And ignore this relation */
+			return false;
+		}
+	}
+
+	return true;
+}
+
+/*
+ * get_partitions_for_expr
+ *		Determine rangeset of partitions corresponding to an expression
+ */
+static List *
+get_partitions_for_expr(const Expr *expr, walk_expr_tree_context *ctx)
+{
+	BoolExpr		   *boolexpr;
+	OpExpr			   *opexpr;
+	ScalarArrayOpExpr  *arrexpr;
+
+	switch (expr->type)
+	{
+		/* AND, OR, NOT expressions */
+		case T_BoolExpr:
+			boolexpr = (BoolExpr *) expr;
+			return get_partitions_for_boolexpr(boolexpr, ctx);
+
+		/* =, !=, <, > etc. */
+		case T_OpExpr:
+			opexpr = (OpExpr *) expr;
+			return get_partitions_for_opexpr(opexpr, ctx);
+
+		/* IN expression */
+		case T_ScalarArrayOpExpr:
+			arrexpr = (ScalarArrayOpExpr *) expr;
+			return get_range_partitions_for_arropexpr(arrexpr, ctx);
+		default:
+			return make_all_partitions_rangeset(ctx->desc);
+	}
+	return NIL;
+}
+
+/*
+ * get_partitions_for_boolexpr
+ *		Determine rangeset of partitions for BoolExpr
+ *
+ * Function calculates rangesets for every operand and then depending on boolean
+ * operator it intersects (AND) or unites (OR) them
+ */
+static List *
+get_partitions_for_boolexpr(const BoolExpr *expr, walk_expr_tree_context *ctx)
+{
+	ListCell   *lc;
+	Expr	   *arg;
+	List	   *result = NIL,
+			   *ranges;
+
+	if (expr->boolop == AND_EXPR)
+		result = make_all_partitions_rangeset(ctx->desc);
+
+	foreach (lc, expr->args)
+	{
+		arg = (Expr *) lfirst(lc);
+
+		ranges = get_partitions_for_expr(arg, ctx);
+		switch (expr->boolop)
+		{
+			case AND_EXPR:
+				result = rangeset_intersection(result, ranges);
+				break;
+			case OR_EXPR:
+				result = rangeset_union(result, ranges);
+				break;
+			default:
+				break;
+		}
+	}
+
+	return result;
+}
+
+/*
+ * get_partitions_for_opexpr
+ *		Determine rangeset of partitions for a given OpExpr
+ */
+static List *
+get_partitions_for_opexpr(const OpExpr *expr, walk_expr_tree_context *ctx)
+{
+	switch (ctx->key->strategy)
+	{
+		case PARTITION_STRAT_RANGE:
+			return get_range_partitions_for_opexpr(expr, ctx);
+
+		/* TODO: */
+		case PARTITION_STRAT_LIST:
+		default:
+			return make_all_partitions_rangeset(ctx->desc);
+	}
+}
+
+/*
+ * get_range_partitions_for_opexpr
+ *		Determine partitions rangeset of the RANGE partitioned relation for the
+ *		expressions like 'VAR OP CONST'
+ *
+ * It uses binary search to find position (pos) of partition in PartitionDesc
+ * corresponding to the CONST value. Depending on operator there are cases:
+ *
+ *   * <, <=  --  we need partitions located left from the pos (including pos);
+ *   * =  -- we need the exact found partition;
+ *   * >, >=  --  we need partitions located right from the pos (including pos).
+ *
+ * If operator is '<' and CONST value is on the lower boundary of partition then
+ * exclude pos from rangeset. Similarly for '>' operator.
+ */
+static List *
+get_range_partitions_for_opexpr(const OpExpr *expr, walk_expr_tree_context *ctx)
+{
+	var_const_pair pair;
+	int		strategy;
+	int		pos = -1;
+	int		nearest_pos = -1;
+	uint32	lower=0,
+			upper=ctx->desc->nparts - 1;
+
+	/*
+	 * Get Var and Const from OpEpxr. If Var belong to another relation
+	 * then just return all partitions.
+	 */
+	if (!pull_var_const(expr, &pair) || pair.pvar->varno != ctx->index ||
+		pair.pvar->varattno != ctx->key->partattrs[0])
+		return list_make1(make_range(lower, upper));
+
+	/* Use binary search to find appropriate partition */
+	pos = range_partition_bsearch(ctx->key, ctx->desc, &pair.pconst->constvalue, &nearest_pos);
+	strategy = get_op_opfamily_strategy(expr->opno, ctx->key->partopfamily[0]);
+
+	/* If partition wasn't found then determine the nearest one */
+	if (pos < 0)
+	{
+		pos = nearest_pos;
+		switch (strategy)
+		{
+			case BTEqualStrategyNumber:
+				return NIL;
+			case BTLessStrategyNumber:
+			case BTLessEqualStrategyNumber:
+				upper = pos;
+				if (!ctx->desc->rangelowers[pos]->infinite &&
+					leftof(ctx->key, &pair.pconst->constvalue, ctx->desc->rangelowers[pos]))
+				{
+					if (pos > 0)
+						upper = pos - 1;
+				}
+				break;
+			case BTGreaterStrategyNumber:
+			case BTGreaterEqualStrategyNumber:
+				lower = pos;
+				if (!ctx->desc->rangeuppers[pos]->infinite &&
+					rightof(ctx->key, &pair.pconst->constvalue, ctx->desc->rangeuppers[pos]))
+				{
+					if (pos < ctx->desc->nparts - 1)
+						lower = pos + 1;
+				}
+				break;
+		}
+		return list_make1(make_range(lower, upper));
+	}
+
+	/*
+	 * Based on partition position and operator determine which partitions we
+	 * should return. For example if it is '<=' operator then we should return
+	 * this partition and all the partition that stand left from it, etc.
+	 */
+	switch (strategy)
+	{
+		case BTEqualStrategyNumber:
+			upper = lower = pos;
+			break;
+
+		case BTLessStrategyNumber:
+			upper = pos;
+			/*
+			 * If operator is '<' and CONST value is a lower boundary then
+			 * exclude current partition from rangeset
+			 */
+			if (!ctx->desc->rangelowers[pos]->infinite &&
+				!compare_range_keys(ctx->key, &pair.pconst->constvalue,
+									ctx->desc->rangelowers[pos]->val))
+			{
+				if (pos > 0)
+					upper = pos - 1;
+				else
+					return NIL;
+			}
+			break;
+
+		case BTLessEqualStrategyNumber:
+			upper = pos;
+			break;
+
+		case BTGreaterStrategyNumber:
+			lower = pos;
+			/*
+			 * If operator is '>' and CONST value is an upper boundary then
+			 * exclude current partition from rangeset
+			 */
+			if (!ctx->desc->rangeuppers[pos]->infinite &&
+				!compare_range_keys(ctx->key, &pair.pconst->constvalue,
+									ctx->desc->rangeuppers[pos]->val))
+			{
+				if (pos < ctx->desc->nparts - 1)
+					lower = pos + 1;
+				else
+					return NIL;
+			}
+
+		case BTGreaterEqualStrategyNumber:
+			lower = pos;
+			break;
+	}
+
+	return list_make1(make_range(lower, upper));
+}
+
+/*
+ * pull_var_const
+ *		Check if given expression has 'VAR OP CONST' structure. If it's true
+ * then store Var and Const nodes into pair object
+ */
+static bool
+pull_var_const(const OpExpr *expr, var_const_pair *pair)
+{
+	Node	*firstarg = NULL,
+			*secondarg = NULL;
+
+	if (!IsA(expr, OpExpr))
+		return false;
+
+	if (list_length(expr->args) == 2)
+	{
+		firstarg = (Node *) linitial(expr->args);
+		secondarg = (Node *) lsecond(expr->args);
+
+		if (IsA(firstarg, Var) && IsA(secondarg, Const))
+		{
+			pair->pvar = (Var *) firstarg;
+			pair->pconst = (Const *) secondarg;
+			return true;
+		}
+		else if (IsA(secondarg, Var) && IsA(firstarg, Const))
+		{
+			pair->pvar = (Var *) secondarg;
+			pair->pconst = (Const *) firstarg;
+			return true;
+		}
+	}
+
+	return false;
+}
+
+/*
+ * get_range_partitions_for_arropexpr
+ *		Returns rangeset for scalar array expression
+ */
+static List *
+get_range_partitions_for_arropexpr(const ScalarArrayOpExpr *expr, walk_expr_tree_context *ctx)
+{
+	Node	   *left = (Node *) linitial(expr->args),
+			   *right = (Node *) lsecond(expr->args);
+	Var		   *var;
+	ArrayExpr  *array;
+	List	   *result = NIL;
+	ListCell   *l;
+	int			pos;
+
+	/*
+	 * If leftside isn't a variable or rightside isn't an array expression
+	 * then just return all whole partitions range
+	 */
+	if (!IsA(left, Var) || !IsA(right, ArrayExpr))
+		return list_make1(make_range(0, ctx->desc->nparts - 1));
+
+	var = (Var *) left;
+	array = (ArrayExpr *) right;
+
+	/* If variable isn't a partition key */
+	if (var->varno != ctx->index || var->varattno != ctx->key->partattrs[0])
+		return list_make1(make_range(0, ctx->desc->nparts - 1));
+
+	if (array)
+	{
+		int16		elmlen;
+		bool		elmbyval;
+
+		get_typlenbyval(array->element_typeid,
+						&elmlen, &elmbyval);
+
+		/* Construct ranges list */
+		foreach(l, array->elements)
+		{
+			Node *c = (Node *) lfirst (l);
+
+			if (!IsA(c, Const))
+				return list_make1(make_range(0, ctx->desc->nparts - 1));
+
+			/* Use binary search to find appropriate partition */
+			pos = range_partition_bsearch(ctx->key, ctx->desc, &((Const *) c)->constvalue, NULL);
+			result = rangeset_union(result, list_make1(make_range(pos, pos)));
+		}
+
+		return result;
+	}
+
+	result = list_make1(make_range(0, ctx->desc->nparts - 1));
+	return result;
+}
+
+/*
+ * make_range
+ *		Creates partitions range
+ */
+static IndexRange *
+make_range(uint32 lower, uint32 upper)
+{
+	IndexRange *r = palloc0(sizeof(IndexRange));
+
+	r->lower = lower;
+	r->upper = upper;
+	return r;
+}
+
+/*
+ * make_range_intersection
+ *		Returns ranges intersection
+ *
+ * Note: caller must ensure that ranges are intersect
+ */
+static IndexRange *
+make_range_intersection(const IndexRange *a, const IndexRange *b)
+{
+	return make_range(Max(a->lower, b->lower),
+					  Min(a->upper, b->upper));
+}
+
+/*
+ * make_range_union
+ *		Returns ranges union
+ *
+ * Note: caller must ensure that ranges are conjucted
+ */
+static IndexRange *
+make_range_union(const IndexRange *a, const IndexRange *b)
+{
+	return make_range(Min(a->lower, b->lower),
+					  Max(a->upper, b->upper));
+}
+
+/*
+ * ranges_intersect
+ *		Returns true if ranges are intersect and false otherwise
+ */
+static bool
+ranges_intersect(const IndexRange *a, const IndexRange *b)
+{
+	return (a->lower <= b->upper) &&
+		   (b->lower <= a->upper);
+}
+
+/*
+ * ranges_conjuncted
+ *		Returns true if ranges are conjucted and false otherwise
+ */
+static bool
+ranges_conjuncted(const IndexRange *a, const IndexRange *b)
+{
+	return (a->lower <= b->upper+1) &&
+		   (b->lower <= a->upper+1);
+}
+
+/*
+ * rangeset_union
+ *		Builds a union for two given rangesets
+ */
+static List *
+rangeset_union(List *a, List *b)
+{
+	ListCell   *ca,
+			   *cb;
+	List	   *result = NIL;
+	IndexRange *range_a,
+			   *range_b;
+	IndexRange *cur;
+	IndexRange *next;
+	bool		have_cur = false;
+
+	ca = list_head(a);
+	cb = list_head(b);
+
+	while (ca || cb)
+	{
+		/* Fetch next range with lesser lower bound */
+		if (ca && cb)
+		{
+			range_a = (IndexRange *) lfirst(ca);
+			range_b = (IndexRange *) lfirst(cb);
+
+			if (range_a->lower <= range_b->lower)
+			{
+				next = range_a;
+				ca = lnext(ca);
+			}
+			else
+			{
+				next = range_b;
+				cb = lnext(cb);
+			}
+		}
+		else if (ca)
+		{
+			next = (IndexRange *) lfirst(ca);
+			ca = lnext(ca);
+		}
+		else if (cb)
+		{
+			next = (IndexRange *) lfirst(cb);
+			cb = lnext(cb);
+		}
+
+		if (!have_cur)
+		{
+			/* Put this range as current value if don't have it yet */
+			cur = next;
+			have_cur = true;
+		}
+		else
+		{
+			if (ranges_conjuncted(next, cur))
+			{
+				cur = make_range_union(next, cur);
+			}
+			else
+			{
+				/*
+				 * Next range is not conjuncted with current. Put current to the
+				 * result list and put next as current.
+				 */
+				result = lappend(result, cur);
+				cur = next;
+			}
+		}
+	}
+
+	/* Put current value into result list if any */
+	if (have_cur)
+		result = lappend(result, cur);
+
+	return result;
+}
+
+/*
+ * rangeset_intersection
+ *		Builds an intersection for two given rangesets
+ */
+static List *
+rangeset_intersection(List *a, List *b)
+{
+	ListCell   *ca,
+			   *cb;
+	List	   *result = NIL;
+	IndexRange *ra, *rb;
+
+	ca = list_head(a);
+	cb = list_head(b);
+
+	while (ca && cb)
+	{
+		ra = lfirst(ca);
+		rb = lfirst(cb);
+
+		/* Only care about intersecting ranges */
+		if (ranges_intersect(ra, rb))
+		{
+			IndexRange	*intersect, *last;
+
+			intersect = make_range_intersection(ra, rb);
+			if (result != NIL)
+			{
+				last = llast(result);
+				if (ranges_conjuncted(last, intersect))
+					*last = *make_range_union(last, intersect);
+				else
+					result = lappend(result, intersect);
+			}
+			else
+			{
+				result = lappend(result, intersect);
+			}
+		}
+
+		/*
+		 * Fetch next ranges. We use upper bound of current range to determine
+		 * which lists to fetch, since lower bound of next range is greater (or
+		 * equal) to upper bound of current.
+		 */
+		if (ra->upper <= rb->upper)
+			ca = lnext(ca);
+		if (ra->upper >= rb->upper)
+			cb = lnext(cb);
+	}
+
+	return result;
+}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 552b756..9bb8ead 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -35,6 +35,7 @@
 #include "access/sysattr.h"
 #include "catalog/pg_inherits_fn.h"
 #include "catalog/pg_type.h"
+#include "catalog/partition.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -45,6 +46,7 @@
 #include "optimizer/planner.h"
 #include "optimizer/prep.h"
 #include "optimizer/tlist.h"
+#include "optimizer/clauses.h"
 #include "parser/parse_coerce.h"
 #include "parser/parsetree.h"
 #include "utils/lsyscache.h"
@@ -1360,7 +1362,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
 	PlanRowMark *oldrc;
 	Relation	oldrelation;
 	LOCKMODE	lockmode;
-	List	   *inhOIDs;
+	List	   *inhOIDs = NIL;
 	List	   *appinfos;
 	ListCell   *l;
 
@@ -1403,8 +1405,17 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
 	else
 		lockmode = AccessShareLock;
 
-	/* Scan for all members of inheritance set, acquire needed locks */
-	inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
+	/* Determine partitions for partitioned relation */
+	if (relid_is_partitioned(parentOID) && root->parse->jointree)
+	{
+		Node *quals = eval_const_expressions(root, root->parse->jointree->quals);
+
+		inhOIDs = get_partitions_by_quals((Expr *) quals, parentOID, rti, lockmode);
+	}
+
+	if (!inhOIDs)
+		/* Scan for all members of inheritance set, acquire needed locks */
+		inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
 
 	/*
 	 * Check that there's at least one descendant, else treat as no-child
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index f4840e4..5eb9971 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -134,4 +134,5 @@ extern int get_partition_for_tuple(PartitionDescNode pdnode,
 					TupleTableSlot *slot,
 					EState *estate,
 					Oid *failed_at);
+extern List *get_partitions_by_quals(Expr *quals, Oid relationId, Index rti, LOCKMODE lockmode);
 #endif   /* PARTITION_H */
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to