------------------------------------------------------------------------cga <mailto:[email protected]> June 22, 2011 3:13 PM Hi all, <PREMISE> you might forgive me if I don't use Trac, but I'd like to explain and discuss what I'm doing with PGSQL schema.sql for Habari before submitting (if my contrib will be accepted code wise) anything. Not to mention that in these very days things are getting moved on Github. If so, I would clone and request for pull once there, since I have an active account on Github and it seems pointless to get a Trac account now. BUT: since this very patch isn't going (at least theoretically) to have any bad effect on Habari , I'm attaching it to this very email for now. For you to test. Forgive me on this last thing but I haven't set up a test env just yet. I'll do it after sending this patch. <little INTRO and WHY> I'm cga. I'm a Linux SysAdmin and I'm all for "best practices" as much as I can. Not to mention I'm really paranoid and analytical when it concerns coding/installing/configuring/deploying. Especially for things I care/use/ administer. I basically started this little contribution because I really love my stack: debian + postgresql + nginx and I'd like to use it for all my needs. Habari included. Ref: http://drunkenmonkey.org/irc/habari/2011-06-19#T17-04-24 I think it'sfair enough for a small intro. <CONTRIB AND REASONING> As you can read in the chat log I linked above , I'm not expert in neither PostgreSQL nor SQL but I like to contribute for as much as I can. Especially if this can be the only way to keep Habari's support for PostgreSQL alive. This first little but important contribution it is all about the "CREATE SEQUENCE" used in PGSQL actual schema.sql. All the reasoning I do it's to be documented w/ links to official docs and (attacched) chat logs from freenode.#postgresql , where (beside docs and wiki) I learn and ask my stuff. And to my great pleasure I always find very competent and very helpful people. Please forgive my English errors, I'm Italian. Please *really* forgive my verbosity but I'm sure you'd understand that people who didn't follow in freenode.#habari don't know all the reasoning from the room. If I'm to be accepted I promise I'll be as concise as possible in next communications. Here we go: I noticed that you use a lot of serialization in schema.sql , w/o questioning why (which I'm not in a position to, anyway) I noticed that you (being whoever did that schema.sql) do it wrong. Why? Because: CREATE SEQUENCE done in the present form from schema.sql:CREATE SEQUENCE {$prefix}posts_pkey_seq; CREATE TABLE {$prefix}posts ( id BIGINT NOT NULL DEFAULT nextval('{$prefix}posts_pkey_seq'), ..... ..... PRIMARY KEY (id), ..... );1 - It's *almost* correct, but no cigar. It basically miss the most crucial part of the query to make the SEQUENCE itself have any sense *and* work properly:ALTER SEQUENCE {$prefix}posts_pkey_seq OWNED BY{$prefix}posts.id;
The OWNED BY argument was only added in 8.2. The last time I tested the PG support (admittedly a long time ago) Debian only had 8.1 in its repos, so that was a major deal. I'm not sure what is commonly available now (though I suspect 8.3 is widely used), but breaking support for 8.1 users is something to be considered, particularly since I see no real advantage to adding the statement in our use case.
Per the Postgres documentation <http://www.postgresql.org/docs/8.3/static/sql-createsequence.html> on CREATE SEQUENCE, OWNED BY merely associates the sequence with the table (and field) specified so that if the table (or field) is dropped the sequence will be automatically dropped as well. This sounds like a good idea at first, but I can see instances in which this might cause problems.
In the past there have been times that we've needed to modify a table, but one RDBMS or another won't support the right ALTER syntax, so we've created a new field, migrated data, dropped the original field, and then reversed the process. If we used OWNED BY on the sequence as you suggest the sequence would be automatically dropped and likely cause a ton of problems. Not only would we have to remember to re-create the sequence (manually and specifically for Postgres), but we would have to guess at what the starting value should be, otherwise we would generate duplicate IDs - a Very Bad Thing (tm).
So the only advantage we'd get is that if the table were to be dropped the sequence would be automatically deleted with it. That's nice for a user who is no longer using Habari, but hardly a major deal - it's an extra query per table to drop its relevant sequence or a few extra clicks in your GUI of choice. I can't see any instance in which we would be dropping the Posts table (or most of the others) programmatically, so it really provides no extra value for developers.
I don't understand this. Can you explain how it's ever going to behave in an unexpected way? How would a sequence ever result in duplicate numbers, unless it has been manually ALTERed with the RESTART argument? Even if we used OWNED BY, you could still screw the process up by manually changing the sequence, so that wouldn't solve the problem either.w/o this ALTER TABLE the SEQUENCE is to behave weird more sooner than later, resulting in duplicate numbers (see chat log; using BIGINT won't save you from this, more on this later) and errors that are more than diffcult to find when they happen.
The use of the sequence, whether OWNED BY or not, should have nothing to do with whether the destination field is an INT or a BIGINT.
2 - it determines in a recursive way the name of the SEUQNECE itself , which is wrong because you cannot be 100% sure that will actually work as intended. (see chat log).
I'm not sure I understand your point here either. The sequence name is dynamically created during install based on the DB prefix you specify, yes. Again, how is that going to cause a problem and (aside from using a SERIAL, as mentioned below) what other alternative is there?
I cannot speak towards any performance differences between 32bit and 64bit versions of Postgres, though I'm admittedly skeptical. We use BIGINT everywhere we can for ID columns (at least in MySQL and SQLite, looks like Postgres might have been missed) for one simple reason: we can. The storage difference between INT and BIGINT varies between platform but is so negligible we've never worried about it.3 - it uses BIGINT. My reasoning is that whoever created this schema.sql didn't wan't to finish the SEQUENCE too soon. Good idea , bad implementation. I strongly believe that BIGINT is not needed indeed, and that it's just a waste of cylces and space. Not to mention that BIGINT behaves differently on 32bit and 64bit, causing slowness (and perhaps some more side effects) on 32bit. (this is chat logs from the office, if you need them I'll attach them tomorrow). And as explained just above, it is not going to be used as a BIGINT anyway because of the wrong query CREATE SEQUENCE. But the main point why it is not needed it's because who's to write a post every second for # quintillions years? I think that a post every second for 68yrs is more than enough (INTEGER/SERIAL). (see chat log too).
On the other hand there are instances in which it may be of value to have the added capacity. A few off the top of my head:
1. Logs. Particularly if a plugin starts generating errors, you could conceivably start generating a TON of IDs. Over the course of many years running Habari it's not entirely outside the realm of possibility that you could exhaust an INT.
2. Posts that aren't posts. While most of us use Habari only for blogging, there are several instances where there have been a lot of other things built on top of it. Think product catalogs / online stores, forums, etc. It may seem crazy to create a new post every second for 68 years, but it seems a lot less crazy if you stop thinking about each entry as a time-consuming piece of written content. When you also include multiple types of content (like revisions, tweets, links, etc.) and consider that there might be a lot of turnover (or "churn") there's all the more reason to go ahead and use a BIGINT.
If an INT column were to be maxed out it's a simple matter for a DBA to go in and change it (after figuring out what the problem actually was, of course), but recreating the indexes could be a very time consuming process as well. I believe this is actually one of the primary reasons WordPress decided to switch to BIGINT fields many years ago.
So basically there are several reasons to use BIGINTs, even if they are long-shots, and no real reason not to (aside from your suggestion that there's a performance difference, which we've never heard of before). Think of it as a rather hopeful best practice.
4 - when all the above are to be combined w/ the use of NOT NULL DEFAULT nextval()" (and I mean all of them are compulsory clauses, see chat log), you can save a lot of hassles, code and cycles with just the improvement I'm submitting for every CREATE SEQUENCE query, leaving intact the final result for what it concerns Habari's functioning *and* improves things a whole lotta love:CREATE TABLE {$prefix}posts ( id SERIAL, ..... ..... PRIMARY KEY (id), ..... );
Ignoring everything in the last 3 points related to sequences, I'm really not sure why we never used SERIAL fields. From an implementation standpoint they are identical to our approach (the Postgres documentation <http://www.postgresql.org/docs/8.1/static/datatype.html#DATATYPE-SERIAL> even shows you the equivalent syntax using a regular sequence), but they would have been more analogous to the AUTO INCREMENT fields we use in MySQL and SQLite.
On the flip side, since they're the same there's likewise no real compelling reason to use SERIAL over a sequence. We had previously (and hopefully still do) wanted to also support Oracle, which is very similar to Postgres. Oracle lacks a SERIAL data type, so it would require the sequence method to be used anyway. Continuing to use sequences in Postgres would mean less variation between the two and make it a good bit easier to manage them.
<IMPROVEMENTS ACHIEVED> Improvements and optimizations achieved with this little patch: (see Ref. [2]) 0 - it will leave intact the need of Habari to use the numerical field for internal functioning. 1 - (id) will be unique and auto incremental by properties inherited from query. 2 - You won't skip a number in a SEQUENCE. 3 - It won't fail because of wrong creation query. 4 - It won't determine an unexpected SEQUENCE name/relation to the column you want it to. 5 - OWNED BY is to be used implicitly , this will drop anything related to the elected talbe and/or column. This sounds harsh, but what will you do w/ data that isn't associated w/ anything anymore? <FINAL THOUGHTS> 1 - For the impact on Habari's code (which I didn't read because I don't do php and couldn't afford to get it wrong) on insert the next value of the sequence into the serial column please read Ref. [2] just below the PostgreSQL 7.3 Note. 2 - This new schema.sql would work (in theory) for new installations but I don't know how to implement and test a system/schema/pgsql/upgrades/post/*.sql yet. Help and advice from you is very much apprecciated. 3 - Let me know if anyone is going to test this before I do. If you want. Anyway I'm going to test it when my test env will be up and running. 4 - I'm already planning the next batch of fixes but that can wait for now. I anticipate that it will be about optimizing queries and about keyvil issues as much as possible and w/o breaking Habari Ref. [3]. My pleasure to meet you. <REFERENCES> Ref. chat log attacched. Ref [1]. http://www.postgresql.org/docs/8.4/interactive/sql- createsequence.html Ref [2]. http://www.postgresql.org/docs/8.4/interactive/datatype- numeric.html#DATATYPE-SERIAL Ref. [3] http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327 http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-ii-7345 http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-iii-7365
-- To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/habari-dev
<<inline: compose-unknown-contact.jpg>>
