Re: [HACKERS] On partitioning

Amit Langote Tue, 02 Dec 2014 19:52:39 -0800

Hi Robert,

From: Robert Haas [mailto:[email protected]]
> > * Catalog schema:
> >
> > CREATE TABLE pg_catalog.pg_partitioned_rel
> > (
> >    partrelid                oid    NOT NULL,
> >    partkind                oid    NOT NULL,
> >    partissub              bool  NOT NULL,
> >    partkey                 int2vector NOT NULL, -- partitioning attributes
> >    partopclass         oidvector,
> >
> >    PRIMARY KEY (partrelid, partissub),
> >    FOREIGN KEY (partrelid)   REFERENCES pg_class (oid),
> >    FOREIGN KEY (partopclass) REFERENCES pg_opclass (oid)
> > )
> > WITHOUT OIDS ;
> 
> So, we're going to support exactly two levels of partitioning?
> partitions with partissub=false and subpartitions with partissub=true?
>  Why not support only one level of partitioning here but then let the
> children have their own pg_partitioned_rel entries if they are
> subpartitioned?  That seems like a cleaner design and lets us support
> an arbitrary number of partitioning levels if we ever need them.
>


Yeah, that's what I thought at some point in favour of dropping partissub 
altogether. 

However, not that this design solves it, there is one question - if we would 
want to support defining for a table both partition key and sub-partition key 
in advance? That is, without having defined a first level partition yet; in 
that case, what level do we associate sub-(sub-) partitioning key with or more 
to the point where do we keep it? One way is to replace partissub by 
partkeylevel with level 0 being the topmost-level partitioning key and so on 
while keeping the partrelid equal to the pg_class.oid of the parent. That 
brings us to next question of managing hierarchies in pg_partition_def 
corresponding to partkeylevel in the definition of topmost partitioned 
relation. But I guess those are implementation details rather than 
representational unless I am being too naïve.

> > CREATE TABLE pg_catalog.pg_partition_def
> > (
> >    partitionid                      oid     NOT NULL,
> >    partitionparentrel       oid    NOT NULL,
> >    partitionisoverflow     bool  NOT NULL,
> >    partitionvalues             anyarray,
> >
> >    PRIMARY KEY (partitionid),
> >    FOREIGN KEY (partitionid) REFERENCES pg_class(oid)
> > )
> > WITHOUT OIDS;
> >
> > ALTER TABLE pg_catalog.pg_class ADD COLUMN relispartitioned;
> 
> What is an overflow partition and why do we want that?
> 

That would be a default partition. That is, where the tuples that don't belong 
elsewhere (other defined partitions) go. VALUES clause of the definition for 
such a partition would look like:

(a range partition) ... VALUES LESS THAN MAXVALUE 
(a list partition) ... VALUES DEFAULT

There has been discussion about whether there shouldn't be such a place for 
tuples to go. That is, it should generate an error if a tuple can't go anywhere 
(or support auto-creating a new one like in interval partitioning?)

> What are you going to do if the partitioning key has two columns of
> different data types?
> 

Sorry, this totally eluded me. Perhaps, the 'values' needs some more thought. 
They are one of the most crucial elements of the scheme.

I wonder if your suggestion of pg_node_tree plays well here. This then could be 
a list of CONSTs or some such... And I am thinking it's a concern only for 
range partitions, no? (that is, a multicolumn partition key)

I think partkind switches the interpretation of the field as appropriate. Am I 
missing something? By the way, I had mentioned we could have two values fields 
each for range and list partition kind.

> > * DDL syntax (no multi-column partitioning, sub-partitioning support as 
> > yet):
> >
> > -- create partitioned table and child partitions at once.
> > CREATE TABLE parent (...)
> > PARTITION BY [ RANGE | LIST ] (key_column) [ opclass ]
> > [ (
> >      PARTITION child
> >        {
> >            VALUES LESS THAN { ... | MAXVALUE } -- for RANGE
> >          | VALUES [ IN ] ( { ... | DEFAULT } ) -- for LIST
> >        }
> >        [ WITH ( ... ) ] [ TABLESPACE tbs ]
> >      [, ...]
> >   ) ] ;
> 
> How are you going to dump and restore this, bearing in mind that you
> have to preserve a bunch of OIDs across pg_upgrade?  What if somebody
> wants to do pg_dump --table name_of_a_partition?
> 

Assuming everything's (including partitioned relation and partitions at all 
levels) got a pg_class entry of its own, would OIDs be a problem? Or what is 
the nature of this problem if it's possible that it may be.

If someone pg_dump's an individual partition as a table, we could let it be 
dumped as just a plain table. I am thinking we should be able to do that or 
should be doing just that (?)

> I actually think it will be much cleaner to declare the parent first
> and then have separate CREATE TABLE statements that glue the children
> in, like CREATE TABLE child PARTITION OF parent VALUES LESS THAN (1,
> 10000).
> 

Oh, do you mean to do away without any syntax for defining partitions with 
CREATE TABLE parent?

By the way, do you mean the following:

CREATE TABLE child PARTITION OF parent VALUES LESS THAN (1, 10000)

Instead of,

CREATE PARTITION child ON parent VALUES LESS THAN 10000?

And as for the dump of a partitioned table, it does sound cleaner to do it 
piece by piece starting with the parent and its partitioning key (as ALTER on 
it?) followed by individual partitions using either of the syntax above. 
Moreover we dump a sub-partition as a partition on its parent partition.

Thanks for your time and valuable input.

Regards,
Amit




-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

Reply via email to