Hi Robert,
From: Robert Haas [mailto:[email protected]]
> > * Catalog schema:
> >
> > CREATE TABLE pg_catalog.pg_partitioned_rel
> > (
> > partrelid oid NOT NULL,
> > partkind oid NOT NULL,
> > partissub bool NOT NULL,
> > partkey int2vector NOT NULL, -- partitioning attributes
> > partopclass oidvector,
> >
> > PRIMARY KEY (partrelid, partissub),
> > FOREIGN KEY (partrelid) REFERENCES pg_class (oid),
> > FOREIGN KEY (partopclass) REFERENCES pg_opclass (oid)
> > )
> > WITHOUT OIDS ;
>
> So, we're going to support exactly two levels of partitioning?
> partitions with partissub=false and subpartitions with partissub=true?
> Why not support only one level of partitioning here but then let the
> children have their own pg_partitioned_rel entries if they are
> subpartitioned? That seems like a cleaner design and lets us support
> an arbitrary number of partitioning levels if we ever need them.
>
Yeah, that's what I thought at some point in favour of dropping partissub
altogether.
However, not that this design solves it, there is one question - if we would
want to support defining for a table both partition key and sub-partition key
in advance? That is, without having defined a first level partition yet; in
that case, what level do we associate sub-(sub-) partitioning key with or more
to the point where do we keep it? One way is to replace partissub by
partkeylevel with level 0 being the topmost-level partitioning key and so on
while keeping the partrelid equal to the pg_class.oid of the parent. That
brings us to next question of managing hierarchies in pg_partition_def
corresponding to partkeylevel in the definition of topmost partitioned
relation. But I guess those are implementation details rather than
representational unless I am being too naïve.
> > CREATE TABLE pg_catalog.pg_partition_def
> > (
> > partitionid oid NOT NULL,
> > partitionparentrel oid NOT NULL,
> > partitionisoverflow bool NOT NULL,
> > partitionvalues anyarray,
> >
> > PRIMARY KEY (partitionid),
> > FOREIGN KEY (partitionid) REFERENCES pg_class(oid)
> > )
> > WITHOUT OIDS;
> >
> > ALTER TABLE pg_catalog.pg_class ADD COLUMN relispartitioned;
>
> What is an overflow partition and why do we want that?
>
That would be a default partition. That is, where the tuples that don't belong
elsewhere (other defined partitions) go. VALUES clause of the definition for
such a partition would look like:
(a range partition) ... VALUES LESS THAN MAXVALUE
(a list partition) ... VALUES DEFAULT
There has been discussion about whether there shouldn't be such a place for
tuples to go. That is, it should generate an error if a tuple can't go anywhere
(or support auto-creating a new one like in interval partitioning?)
> What are you going to do if the partitioning key has two columns of
> different data types?
>
Sorry, this totally eluded me. Perhaps, the 'values' needs some more thought.
They are one of the most crucial elements of the scheme.
I wonder if your suggestion of pg_node_tree plays well here. This then could be
a list of CONSTs or some such... And I am thinking it's a concern only for
range partitions, no? (that is, a multicolumn partition key)
I think partkind switches the interpretation of the field as appropriate. Am I
missing something? By the way, I had mentioned we could have two values fields
each for range and list partition kind.
> > * DDL syntax (no multi-column partitioning, sub-partitioning support as
> > yet):
> >
> > -- create partitioned table and child partitions at once.
> > CREATE TABLE parent (...)
> > PARTITION BY [ RANGE | LIST ] (key_column) [ opclass ]
> > [ (
> > PARTITION child
> > {
> > VALUES LESS THAN { ... | MAXVALUE } -- for RANGE
> > | VALUES [ IN ] ( { ... | DEFAULT } ) -- for LIST
> > }
> > [ WITH ( ... ) ] [ TABLESPACE tbs ]
> > [, ...]
> > ) ] ;
>
> How are you going to dump and restore this, bearing in mind that you
> have to preserve a bunch of OIDs across pg_upgrade? What if somebody
> wants to do pg_dump --table name_of_a_partition?
>
Assuming everything's (including partitioned relation and partitions at all
levels) got a pg_class entry of its own, would OIDs be a problem? Or what is
the nature of this problem if it's possible that it may be.
If someone pg_dump's an individual partition as a table, we could let it be
dumped as just a plain table. I am thinking we should be able to do that or
should be doing just that (?)
> I actually think it will be much cleaner to declare the parent first
> and then have separate CREATE TABLE statements that glue the children
> in, like CREATE TABLE child PARTITION OF parent VALUES LESS THAN (1,
> 10000).
>
Oh, do you mean to do away without any syntax for defining partitions with
CREATE TABLE parent?
By the way, do you mean the following:
CREATE TABLE child PARTITION OF parent VALUES LESS THAN (1, 10000)
Instead of,
CREATE PARTITION child ON parent VALUES LESS THAN 10000?
And as for the dump of a partitioned table, it does sound cleaner to do it
piece by piece starting with the parent and its partitioning key (as ALTER on
it?) followed by individual partitions using either of the syntax above.
Moreover we dump a sub-partition as a partition on its parent partition.
Thanks for your time and valuable input.
Regards,
Amit
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers