Re: [habari-dev] [PostgreSQL] - SCHEMA.SQL - CREATE SEQUENCE Improvements

cga Wed, 22 Jun 2011 17:24:03 -0700

Hi Chris,

I appreciate your comments, but keep in mind that I'm not a DBA nor a SQLNinja 
nor a PostgreSQL, expert. I'm learning and trying to help. That's all, and 
that was specified too. Any contribution that can make me grow is more than 
welcome (of course while I do something hoepfully useful for Habari too).


The chat logs are very informative about the things I mentioned. I attached 
them on puropose so people could read what they exactly told me when asking 
help and seeking info/trying to understand what I was going to do. I might 
have misunderstood a point, even if I asked until I got it, so their written 
and logged words are to be a better source if you please. 

One thing is for sure for me and my personal knwoledge: they are certainly 
expert on the matter (more than me) and I trust their word as much as I trust 
yours. (the PKEY issue was briefly chatted in perhaps 4 or 5 lines, but you got 
it right, it is something that I would implement if i was to plan a schema 
from scratch and still would like to if it doesn't break a thing in Habari)

Now, 

<BIGINT>
I could agree on the use of BIGINT if BIGINT keeps you safer for the reasons 
you mentioned, fair enough. After "deciding" to use only INTEGER (2 billions 
of post for a blog? eh!) I made a joke:

>[Wednesday 22 June 2011] [00:22:55]     * cga 30yrs later: damn.... this is 
ipv4 vs ipv6 all over again..... i should have used BIGSERIAL!!! :P

That reflects my first intention to stick w/ BIGINT/BIGSERIAL. 

But still: is this a ecommerce platform or a blog engine? Do you post w/ 
scripts ever second? Wouldn't be better to use twitter widgets instead of 
spamming the db? Shouldn't plugins log to filesystem? 

Anyway if BIGINT makes you folks feel safer, I agree to use it.

<ALTER SEQUENCE ... OWNED BY ... ;>
Allow me to say that if you used ALTER SEQUENCE ... OWNED BY ... since the 
beginning you could extract the *exact* sequence point, not guess it, during 
migration prior to drop, and used it for anything related to migration. Hence

>we would generate duplicate IDs - a Very Bad Thing (tm).

wouldn't to be the case. At all. That's what I meant with:

> 1 - It's *almost* correct, but no cigar. It basically miss the most crucial
> part of the query to make the SEQUENCE itself have any sense *and* work
> properly

Anyway this is something I want to test, like I replied to Rick in the other 
email. Tomorrow I will test.

RhodiumToad already told me that they will not look the same across "old 
style" and "new style", but still I want to try. At least to check that the 
use "old style" w/ ALTER SEQUENCE ... OWNED BY ... ; could be good enough and 
prove you wrong on the migrate issue. (speaking of old style vs old style w/ 
ALTER OWNED).

As for legacy support (pgsql 8.1 in debian >etch?) I can see why you did it, 
but as you guessed correctly that the most wide used should be 8.3 and 8.4 
(which is my case on lenny w/ bpo repo) and marginally 9.x on cutting edge 
distro like Arch (my distro of choice for desktop use).

So applying ALTER SEQUENCE ... OWNED BY .... ; it is not as much of a problem 
unless you use EOL version of PGSQL . If they introduced such a feature it 
might have been for a good reason. The above mentioned could be it, for one.

http://wiki.postgresql.org/wiki/PostgreSQL_Release_Support_Policy

Unless of course you want to support all of the EOL products, good luck.

You also aid that:

> So the only advantage we'd get is that if the table were to be dropped 
> the sequence would be automatically deleted with it. 

Not only this....

> I can't see any instance in which we would be dropping the Posts table (or 
most of the others) programmatically, so it really provides no extra value for 
developers.

....in fact it allows you a key feature that is the main reason why you could 
have migrated w/o guessing. The use of:

SELECT currval(pg_get_serial_sequence('tablename', 'colname'));

pg_get_serial_sequence() is available only if you either use SERIAL or the 
"old method" correctly (w/ ALTER OWNED).

>I don't understand this. Can you explain how it's ever going to behave 
> in an unexpected way? How would a sequence ever result in duplicate 
> numbers, unless it has been manually ALTERed with the RESTART argument? 
>.....
> The use of the sequence, whether OWNED BY or not, should have nothing to 
> do with whether the destination field is an INT or a BIGINT.

My bad, I misuderstood the problem of misbehaviour, I mistook numbers with 
names... so I'll let you read what RhodiumToad told me about it, the very first 
minutes after I asked for help. 

From:
> [Tuesday 21 June 2011] [22:00:10] <RhodiumToad> you should avoid assuming 
you know the sequence name, of course - use insert ... returning or 
pg_get_serial_sequence

To at least:
>[Tuesday 21 June 2011] [22:11:50] <RhodiumToad> we're talking about the NAME 
of the sequence, not the value
>[Tuesday 21 June 2011] [22:11:59] <cga> thanks a lot RhodiumToad 

BUT *IF* Habari also relies on sequences' names too, you are to be 
experiencing issues and headaches. == RPITA.

<SERIAL>
>I'm not sure I understand your point here either. The sequence name is 
>dynamically created during install based on the DB prefix you specify, 
>yes.

Consider this (from the very first lines of logs):

> [Tuesday 21 June 2011] [21:59:28] <RhodiumToad> those are equivalent, yes 
(the first one is of course impossible due to the circular reference)

> [Tuesday 21 June 2011] [22:00:10] <RhodiumToad> you should avoid assuming 
you know the sequence name, of course - use insert ... returning or 
pg_get_serial_sequence

He's talking about 1st == your schema.sql w/ use of ALTER SEQUENCE ; 2nd my 
patches.

>  Again, how is that going to cause a problem and (aside from using a 
>SERIAL, as mentioned below) what other alternative is there?

Using SERIAL you lets pgsql determine things sequence related, while 
preserving your ${prefix}foo thing intacted for the table, and have the 
advantages mentioned in docs for DATATYPE-SERIAL and my first email.

Anyway,

If you don't want to implement SERIAL it's another matter on its own. Yet the 
pros as more than the cons you've mentioned (most of them are obsolete anyway 
or non issues to my way of seeing this if we don't consider existing db for a 
day or two until i test).

> On the flip side, since they're the same there's likewise no real 
>compelling reason to use SERIAL over a sequence. We had previously (and 
>hopefully still do) wanted to also support Oracle, which is very similar 
> to Postgres. Oracle lacks a SERIAL data type, so it would require the 
> sequence method to be used anyway. Continuing to use sequences in 
> Postgres would mean less variation between the two and make it a good 
> bit easier to manage them.

I agree that if you want to support more DB you are to keep things as similar 
as possible. Yet I believe you should try using  "specific" 
paramenters/features if they don't cost too much or break anything or are 
called differently across DB but still do the exact same things..

In fact you already use nextval() which is somewhat pgsql only. 

http://www.postgresql.org/docs/8.4/interactive/sql-createsequence.html  
(bottom of the page)

Then if you are already using a "specific" parameter to pgsql way of doing 
this, why don't just use a proper way to do it? Which is basically the whole 
point?

good night and thanks for your feedback =)

On Thursday 23 June 2011 00:12:43 Chris Meller wrote:
> I've skimmed over the chat logs you referenced, but I don't see the vast
> majority of these points discussed... Primarily all I see is a
> discussion about using ID alone as the primary key vs. other fields. So
> I'm going to go through (below) and ask several questions to clarify.
> 
> > ------------------------------------------------------------------------
> > 
> >     cga <mailto:[email protected]>
> > 
> > June 22, 2011 3:13 PM
> > 
> > 
> > Hi all,
> > 
> > <PREMISE>
> > you might forgive me if I don't use Trac, but I'd like to explain and
> > discuss what I'm doing with PGSQL schema.sql for Habari before
> > submitting (if my contrib will be accepted code wise) anything.
> > 
> > Not to mention that in these very days things are getting moved on
> > Github. If so, I would clone and request for pull once there, since I
> > have an active account on Github and it seems pointless to get a Trac
> > account now.
> > 
> > BUT: since this very patch isn't going (at least theoretically) to have
> > any bad effect on Habari , I'm attaching it to this very email for now.
> > For you to test. Forgive me on this last thing but I haven't set up a
> > test env just yet. I'll do it after sending this patch.
> > 
> > <little INTRO and WHY>
> > I'm cga. I'm a Linux SysAdmin and I'm all for "best practices" as much as
> > I can. Not to mention I'm really paranoid and analytical when it
> > concerns coding/installing/configuring/deploying. Especially for things
> > I care/use/ administer.
> > 
> > I basically started this little contribution because I really love my
> > stack: debian + postgresql + nginx and I'd like to use it for all my
> > needs. Habari included. Ref:
> > http://drunkenmonkey.org/irc/habari/2011-06-19#T17-04-24 I think
> > it'sfair enough for a small intro.
> > 
> > <CONTRIB AND REASONING>
> > As you can read in the chat log I linked above , I'm not expert in
> > neither PostgreSQL nor SQL but I like to contribute for as much as I
> > can. Especially if this can be the only way to keep Habari's support for
> > PostgreSQL alive.
> > 
> > This first little but important contribution it is all about the "CREATE
> > SEQUENCE" used in PGSQL actual schema.sql.
> > 
> > All the reasoning I do it's to be documented w/ links to official docs
> > and (attacched) chat logs from freenode.#postgresql , where (beside docs
> > and wiki) I learn and ask my stuff. And to my great pleasure I always
> > find very competent and very helpful people.
> > 
> > Please forgive my English errors, I'm Italian. Please *really* forgive
> > my verbosity but I'm sure you'd understand that people who didn't follow
> > in freenode.#habari don't know all the reasoning from the room. If I'm
> > to be accepted I promise I'll be as concise as possible in next
> > communications.
> > 
> > Here we go:
> > 
> > I noticed that you use a lot of serialization in schema.sql , w/o
> > questioning why (which I'm not in a position to, anyway) I noticed that
> > you (being whoever did that schema.sql) do it wrong. Why?
> > 
> > Because: CREATE SEQUENCE done in the present form from schema.sql:
> >> CREATE SEQUENCE {$prefix}posts_pkey_seq;
> >> CREATE TABLE {$prefix}posts (
> >> 
> >>   id BIGINT NOT NULL DEFAULT nextval('{$prefix}posts_pkey_seq'),
> >> 
> >> .....
> >> .....
> >> 
> >>   PRIMARY KEY (id),
> >> 
> >> .....
> >> );
> > 
> > 1 - It's *almost* correct, but no cigar. It basically miss the most
> > crucial part of the query to make the SEQUENCE itself have any sense
> > *and* work
> > 
> > properly:
> >> ALTER SEQUENCE  {$prefix}posts_pkey_seq OWNED BY{$prefix}posts.id;
> 
> The OWNED BY argument was only added in 8.2. The last time I tested the
> PG support (admittedly a long time ago) Debian only had 8.1 in its
> repos, so that was a major deal. I'm not sure what is commonly available
> now (though I suspect 8.3 is widely used), but breaking support for 8.1
> users is something to be considered, particularly since I see no real
> advantage to adding the statement in our use case.
> 
> Per the Postgres documentation
> <http://www.postgresql.org/docs/8.3/static/sql-createsequence.html> on
> CREATE SEQUENCE, OWNED BY merely associates the sequence with the table
> (and field) specified so that if the table (or field) is dropped the
> sequence will be automatically dropped as well. This sounds like a good
> idea at first, but I can see instances in which this might cause problems.
> 
> In the past there have been times that we've needed to modify a table,
> but one RDBMS or another won't support the right ALTER syntax, so we've
> created a new field, migrated data, dropped the original field, and then
> reversed the process. If we used OWNED BY on the sequence as you suggest
> the sequence would be automatically dropped and likely cause a ton of
> problems. Not only would we have to remember to re-create the sequence
> (manually and specifically for Postgres), but we would have to guess at
> what the starting value should be, otherwise we would generate duplicate
> IDs - a Very Bad Thing (tm).
> 
> So the only advantage we'd get is that if the table were to be dropped
> the sequence would be automatically deleted with it. That's nice for a
> user who is no longer using Habari, but hardly a major deal - it's an
> extra query per table to drop its relevant sequence or a few extra
> clicks in your GUI of choice. I can't see any instance in which we would
> be dropping the Posts table (or most of the others) programmatically, so
> it really provides no extra value for developers.
> 
> > w/o this ALTER TABLE the SEQUENCE is to behave weird more sooner than
> > later, resulting in duplicate numbers (see chat log; using BIGINT won't
> > save you from this, more on this later) and errors that are more than
> > diffcult to find when they happen.
> 
> I don't understand this. Can you explain how it's ever going to behave
> in an unexpected way? How would a sequence ever result in duplicate
> numbers, unless it has been manually ALTERed with the RESTART argument?
> Even if we used OWNED BY, you could still screw the process up by
> manually changing the sequence, so that wouldn't solve the problem either.
> 
> The use of the sequence, whether OWNED BY or not, should have nothing to
> do with whether the destination field is an INT or a BIGINT.
> 
> > 2 - it determines in a recursive way the name of the SEUQNECE itself ,
> > which is wrong because you cannot be 100% sure that will actually work
> > as intended. (see chat log).
> 
> I'm not sure I understand your point here either. The sequence name is
> dynamically created during install based on the DB prefix you specify,
> yes. Again, how is that going to cause a problem and (aside from using a
> SERIAL, as mentioned below) what other alternative is there?
> 
> > 3 - it uses BIGINT. My reasoning is that whoever created this schema.sql
> > didn't wan't to finish the SEQUENCE too soon. Good idea , bad
> > implementation. I strongly believe that BIGINT is not needed indeed, and
> > that it's just a waste of cylces and space. Not to mention that BIGINT
> > behaves differently on 32bit and 64bit, causing slowness (and perhaps
> > some more side effects) on 32bit. (this is chat logs from the office, if
> > you need them I'll attach them tomorrow). And as explained just above,
> > it is not going to be used as a BIGINT anyway because of the wrong query
> > CREATE SEQUENCE. But the main point why it is not needed it's because
> > who's to write a post every second for # quintillions years? I think
> > that a post every second for 68yrs is more than enough (INTEGER/SERIAL).
> > (see chat log too).
> 
> I cannot speak towards any performance differences between 32bit and
> 64bit versions of Postgres, though I'm admittedly skeptical. We use
> BIGINT everywhere we can for ID columns (at least in MySQL and SQLite,
> looks like Postgres might have been missed) for one simple reason: we
> can. The storage difference between INT and BIGINT varies between
> platform but is so negligible we've never worried about it.
> 
> On the other hand there are instances in which it may be of value to
> have the added capacity. A few off the top of my head:
> 
> 1. Logs. Particularly if a plugin starts generating errors, you could
> conceivably start generating a TON of IDs. Over the course of many years
> running Habari it's not entirely outside the realm of possibility that
> you could exhaust an INT.
> 
> 2. Posts that aren't posts. While most of us use Habari only for
> blogging, there are several instances where there have been a lot of
> other things built on top of it. Think product catalogs / online stores,
> forums, etc. It may seem crazy to create a new post every second for 68
> years, but it seems a lot less crazy if you stop thinking about each
> entry as a time-consuming piece of written content. When you also
> include multiple types of content (like revisions, tweets, links, etc.)
> and consider that there might be a lot of turnover (or "churn") there's
> all the more reason to go ahead and use a BIGINT.
> 
> If an INT column were to be maxed out it's a simple matter for a DBA to
> go in and change it (after figuring out what the problem actually was,
> of course), but recreating the indexes could be a very time consuming
> process as well. I believe this is actually one of the primary reasons
> WordPress decided to switch to BIGINT fields many years ago.
> 
> So basically there are several reasons to use BIGINTs, even if they are
> long-shots, and no real reason not to (aside from your suggestion that
> there's a performance difference, which we've never heard of before).
> Think of it as a rather hopeful best practice.
> 
> > 4 - when all the above are to be combined w/ the use of NOT NULL DEFAULT
> > nextval()" (and I mean all of them are compulsory clauses, see chat log),
> > you can save a lot of hassles, code and cycles with just the improvement
> > I'm submitting for every CREATE SEQUENCE query, leaving intact the final
> > result for what it concerns Habari's functioning *and* improves things a
> > whole lotta
> > 
> > love:
> >> CREATE TABLE {$prefix}posts (
> >> 
> >>   id SERIAL,
> >> 
> >> .....
> >> .....
> >> 
> >>   PRIMARY KEY (id),
> >> 
> >> .....
> >> );
> 
> Ignoring everything in the last 3 points related to sequences, I'm
> really not sure why we never used SERIAL fields. From an implementation
> standpoint they are identical to our approach (the Postgres
> documentation
> <http://www.postgresql.org/docs/8.1/static/datatype.html#DATATYPE-SERIAL>
> even shows you the equivalent syntax using a regular sequence), but they
> would have been more analogous to the AUTO INCREMENT fields we use in
> MySQL and SQLite.
> 
> On the flip side, since they're the same there's likewise no real
> compelling reason to use SERIAL over a sequence. We had previously (and
> hopefully still do) wanted to also support Oracle, which is very similar
> to Postgres. Oracle lacks a SERIAL data type, so it would require the
> sequence method to be used anyway. Continuing to use sequences in
> Postgres would mean less variation between the two and make it a good
> bit easier to manage them.
> 
> > <IMPROVEMENTS ACHIEVED>
> > Improvements and optimizations achieved with this little patch: (see Ref.
> > [2])
> > 
> > 0 - it will leave intact the need of Habari to use the numerical field
> > for internal functioning.
> > 1 - (id) will be unique and auto incremental by properties inherited from
> > query.
> > 2 - You won't skip a number in a SEQUENCE.
> > 3 - It won't fail because of wrong creation query.
> > 4 - It won't determine an unexpected SEQUENCE name/relation to the column
> > you want it to.
> > 5 - OWNED BY is to be used implicitly , this will drop anything related
> > to the elected talbe and/or column. This sounds harsh, but what will you
> > do w/ data that isn't associated w/ anything anymore?
> > 
> > <FINAL THOUGHTS>
> > 1 - For the impact on Habari's code (which I didn't read because I don't
> > do php and couldn't afford to get it wrong) on insert the next value of
> > the sequence into the serial column please read Ref. [2] just below the
> > PostgreSQL 7.3 Note.
> > 
> > 2 - This new schema.sql would work (in theory) for new installations but
> > I don't know how to implement and test a
> > system/schema/pgsql/upgrades/post/*.sql yet. Help and advice from you is
> > very much apprecciated.
> > 
> > 3 - Let me know if anyone is going to test this before I do. If you want.
> > Anyway I'm going to test it when my test env will be up and running.
> > 
> > 4 - I'm already planning the next batch of fixes but that can wait for
> > now. I anticipate that it will be about optimizing queries and about
> > keyvil issues as much as possible and w/o breaking Habari Ref. [3].
> > 
> > My pleasure to meet you.
> > 
> > <REFERENCES>
> > Ref. chat log attacched.
> > 
> > Ref [1]. http://www.postgresql.org/docs/8.4/interactive/sql-
> > createsequence.html
> > 
> > Ref [2]. http://www.postgresql.org/docs/8.4/interactive/datatype-
> > numeric.html#DATATYPE-SERIAL
> > 
> > Ref. [3]
> > http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327
> > http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-ii-7345
> > http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-iii-7365

-- 
knowledge has no owner, only means to reveal itself

-- 
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at http://groups.google.com/group/habari-dev

Re: [habari-dev] [PostgreSQL] - SCHEMA.SQL - CREATE SEQUENCE Improvements

Reply via email to