Hello,
I'll add this to CF2017-09.
At Tue, 27 Jun 2017 16:27:18 +0900, Amit Langote
<[email protected]> wrote in
<[email protected]>
> On 2017/06/26 18:44, Kyotaro HORIGUCHI wrote:
> > At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote wrote:
> >>
> >> I recall I had proposed a fix for the same thing some time ago [1].
> >
> > Wow. About 1.5 years ago and stuck? I have a concrete case where
> > this harms so I'd like to fix that this time. How can we move on?
>
> Agreed that this should be fixed.
>
> Your proposed approach #1 to recheck the inheritance after obtaining the
> lock on the child table seems to be a good way forward.
>
> Approach #2 of reordering locking is a simpler solution, but does not
> completely prevent the problem, because DROP TABLE child can still cause
> it to occur, as you mentioned.
>
> >>> The cause is that NO INHERIT doesn't take an exlusive lock on the
> >>> parent. This allows expand_inherited_rtentry to add the child
> >>> relation into appendrel after removal from the inheritance but
> >>> still exists.
> >>
> >> Right.
> >>
> >>> I see two ways to fix this.
> >>>
> >>> The first patch adds a recheck of inheritance relationship if the
> >>> corresponding attribute is missing in the child in
> >>> make_inh_translation_list(). The recheck is a bit complex but it
> >>> is not performed unless the sequence above is happen. It checks
> >>> duplication of relid (or cycles in inheritance) following
> >>> find_all_inheritors (but doing a bit different) but I'm not sure
> >>> it is really useful.
> >>
> >> I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
> >> I guess your hash table based solution will do the job as far as
> >> performance of this check is concerned, although I haven't checked the
> >> code closely.
> >
> > The hash table is not crucial in the patch. Substantially it just
> > recurs down looking up pg_inherits for the child. I considered
> > the new index but abandoned because I thought that such case
> > won't occur so frequently.
>
> Agreed. BTW, the hash table in patch #1 does not seem to be really
> helpful. In the while loop in is_descendant_of_internal(), does
> hash_search() ever return found = true? AFAICS, it does not.
>
> >>> The second patch lets ALTER TABLE NO INHERIT to acquire locks on
> >>> the parent first.
> >>>
> >>> Since the latter has a larger impact on the current behavior and
> >>> we already treat "DROP TABLE child" case in the similar way, I
> >>> suppose that the first approach would be preferable.
> >>
> >> That makes sense.
So, I attached only the first patch, rebased on the current
master (It actually failed to apply on it.) and fixed a typo in a
comment.
This still runs a closed-loop test using temporary hash but it
seems a bit paranoic. (this is the same check with what
find_all_inheritors is doing)
<< the following is another topic >>
> >> BTW, in the partitioned table case, the parent is always locked first
> >> using an AccessExclusiveLock. There are other considerations in that case
> >> such as needing to recreate the partition descriptor upon termination of
> >> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
> >
> > Apart from the degree of concurrency, if we keep parent->children
> > order of locking, such recreation does not seem to be
> > needed. Maybe I'm missing something.
>
> Sorry to have introduced that topic in this thread, but I will try to
> explain anyway why things are the way they are currently:
>
> Once a table is no longer a partition of the parent (detached or dropped),
> we must make sure that the next commands in the transaction don't see it
> as one. That information is currently present in the relcache
> (rd_partdesc), which is used by a few callers, most notably the
> tuple-routing code. Next commands must recreate the entry so that the
> correct thing happens based on the updated information. More precisely,
> we must invalidate the current entry. RelationClearRelation() will either
> delete the entry or rebuild it. If it's being referenced somewhere, it
> will be rebuilt. The place holding the reference may also be looking at
> the content of rd_partdesc, which we don't require them to make a copy of,
> so we must preserve its content while other fields of RelationData are
> being read anew from the catalog. We don't have to preserve it if there
> has been any change (partition added/dropped), but to make such a change
> one would need to take a strong enough lock on the relation (parent). We
> assume here that anyone who wants to reference rd_partdesc takes at least
> AccessShareLock lock on the relation, and anyone who wants to change its
> content must take a lock that will conflict with it, so
> AccessExclusiveLock. Note that in all of this, we are only talking about
> one relation, that is the parent, so parent -> child ordering of taking
> locks may be irrelevant.
I think I understand this, anyway DropInherit and DropPartition
is different-but-at-the-same-level operations so surely needs
amendment for drop/detach cases. Is there already a solution? Or
reproducing steps?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
*** a/src/backend/catalog/pg_inherits.c
--- b/src/backend/catalog/pg_inherits.c
***************
*** 42,47 **** typedef struct SeenRelsEntry
--- 42,49 ----
ListCell *numparents_cell; /* corresponding list cell */
} SeenRelsEntry;
+ static bool is_descendent_of_internal(Oid parentId, Oid childId,
+ HTAB *seen_rels);
/*
* find_inheritance_children
*
***************
*** 400,402 **** typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId)
--- 402,472 ----
return result;
}
+
+ /*
+ * Check if the child is really a descendent of the parent
+ */
+ bool
+ is_descendent_of(Oid parentId, Oid childId)
+ {
+ HTAB *seen_rels;
+ HASHCTL ctl;
+ bool ischild = false;
+
+ memset(&ctl, 0, sizeof(ctl));
+ ctl.keysize = sizeof(Oid);
+ ctl.entrysize = sizeof(Oid);
+ ctl.hcxt = CurrentMemoryContext;
+
+ seen_rels = hash_create("is_descendent_of temporary table",
+ 32, /* start small and extend */
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ ischild = is_descendent_of_internal(parentId, childId, seen_rels);
+
+ hash_destroy(seen_rels);
+
+ return ischild;
+ }
+
+ static bool
+ is_descendent_of_internal(Oid parentId, Oid childId, HTAB *seen_rels)
+ {
+ Relation inhrel;
+ SysScanDesc scan;
+ ScanKeyData key[1];
+ bool ischild = false;
+ HeapTuple inheritsTuple;
+
+ inhrel = heap_open(InheritsRelationId, AccessShareLock);
+ ScanKeyInit(&key[0], Anum_pg_inherits_inhparent,
+ BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(parentId));
+ scan = systable_beginscan(inhrel, InheritsParentIndexId, true,
+ NULL, 1, key);
+
+ while ((inheritsTuple = systable_getnext(scan)) != NULL)
+ {
+ bool found;
+ Oid inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid;
+
+ hash_search(seen_rels, &inhrelid, HASH_ENTER, &found);
+
+ /*
+ * Recursively check into children. Although there can't theoretically
+ * be any cycles in the inheritance graph, check the cycles following
+ * find_all_inheritors.
+ */
+ if (inhrelid == childId ||
+ (!found && is_descendent_of_internal(inhrelid, childId, seen_rels)))
+ {
+ ischild = true;
+ break;
+ }
+ }
+
+ systable_endscan(scan);
+ heap_close(inhrel, AccessShareLock);
+
+ return ischild;
+ }
*** a/src/backend/optimizer/prep/prepunion.c
--- b/src/backend/optimizer/prep/prepunion.c
***************
*** 100,109 **** static List *generate_append_tlist(List *colTypes, List *colCollations,
static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist);
static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte,
Index rti);
! static void make_inh_translation_list(Relation oldrelation,
Relation newrelation,
! Index newvarno,
! List **translated_vars);
static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
List *translated_vars);
static Node *adjust_appendrel_attrs_mutator(Node *node,
--- 100,108 ----
static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist);
static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte,
Index rti);
! static List *make_inh_translation_list(Relation oldrelation,
Relation newrelation,
! Index newvarno);
static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
List *translated_vars);
static Node *adjust_appendrel_attrs_mutator(Node *node,
***************
*** 1508,1513 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
--- 1507,1527 ----
*/
if (childrte->relkind != RELKIND_PARTITIONED_TABLE)
{
+ List *translated_vars =
+ make_inh_translation_list(oldrelation, newrelation,
+ childRTindex);
+
+ if (!translated_vars)
+ {
+ /*
+ * This childrel is no longer a child of the parent.
+ * Close the child relation releasing locks.
+ */
+ if (childOID != parentOID)
+ heap_close(newrelation, lockmode);
+ continue;
+ }
+
/* Remember if we saw a real child. */
if (childOID != parentOID)
has_child = true;
***************
*** 1517,1524 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
appinfo->child_relid = childRTindex;
appinfo->parent_reltype = oldrelation->rd_rel->reltype;
appinfo->child_reltype = newrelation->rd_rel->reltype;
! make_inh_translation_list(oldrelation, newrelation, childRTindex,
! &appinfo->translated_vars);
appinfo->parent_reloid = parentOID;
appinfos = lappend(appinfos, appinfo);
--- 1531,1537 ----
appinfo->child_relid = childRTindex;
appinfo->parent_reltype = oldrelation->rd_rel->reltype;
appinfo->child_reltype = newrelation->rd_rel->reltype;
! appinfo->translated_vars = translated_vars;
appinfo->parent_reloid = parentOID;
appinfos = lappend(appinfos, appinfo);
***************
*** 1623,1636 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/*
* make_inh_translation_list
* Build the list of translations from parent Vars to child Vars for
! * an inheritance child.
*
* For paranoia's sake, we match type/collation as well as attribute name.
*/
! static void
make_inh_translation_list(Relation oldrelation, Relation newrelation,
! Index newvarno,
! List **translated_vars)
{
List *vars = NIL;
TupleDesc old_tupdesc = RelationGetDescr(oldrelation);
--- 1636,1649 ----
/*
* make_inh_translation_list
* Build the list of translations from parent Vars to child Vars for
! * an inheritance child. Returns NULL if the two relations are no longer in
! * a inheritance relationship.
*
* For paranoia's sake, we match type/collation as well as attribute name.
*/
! static List *
make_inh_translation_list(Relation oldrelation, Relation newrelation,
! Index newvarno)
{
List *vars = NIL;
TupleDesc old_tupdesc = RelationGetDescr(oldrelation);
***************
*** 1700,1707 **** make_inh_translation_list(Relation oldrelation, Relation newrelation,
--- 1713,1730 ----
break;
}
if (new_attno >= newnatts)
+ {
+ /*
+ * It's possible that newrelation is no longer a child of the
+ * oldrelation. We just ingore it for the case.
+ */
+ if (!is_descendent_of(oldrelation->rd_id, newrelation->rd_id))
+ return NULL;
+
+ /* If still a child, something's wrong. */
elog(ERROR, "could not find inherited attribute \"%s\" of relation \"%s\"",
attname, RelationGetRelationName(newrelation));
+ }
}
/* Found it, check type and collation match */
***************
*** 1720,1726 **** make_inh_translation_list(Relation oldrelation, Relation newrelation,
0));
}
! *translated_vars = vars;
}
/*
--- 1743,1749 ----
0));
}
! return vars;
}
/*
*** a/src/include/catalog/pg_inherits_fn.h
--- b/src/include/catalog/pg_inherits_fn.h
***************
*** 23,27 **** extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
--- 23,28 ----
extern bool has_subclass(Oid relationId);
extern bool has_superclass(Oid relationId);
extern bool typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId);
+ extern bool is_descendent_of(Oid parentId, Oid childId);
#endif /* PG_INHERITS_FN_H */
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers