Re: [Ledger-smb-devel] Database updates
On Fri, Jul 15, 2011 at 10:03 AM, John Locke m...@freelock.com wrote: Ok, so if we are dealing with a single database server, we could I suppose do a load, and then dump schema dump and then compare against a schema dump from production (filtering out set commands and comments) Is that a built-in PG command? Yeah. I'm not suggesting storing the schema in such a file, just a list of tables in an order they can be created that will load successfully (taking into account foreign keys, types, etc). snip Yes, that's the whole point of having a separate file listing this stuff -- so the base schema can be generated automatically. And re-generated to easily compare against what it should be. Essentially that file replaces Pg-database.sql. But we need to keep the data out of it, because that data may get changed after it's already in the database, making the schema dump inconsistent/not-comparable. So I'm suggesting: 1. Schema index file, parsed by Perl/Bash/whatever 2. base schema sql file, auto-generated from a clean install by a script that parses #1 3. base sql data file, to populate menu tables, etc. So how would this work exactly? Are you thinking one DDL file per table? I guess I am wondering if this would create more maintenance problems than it would solve. I mean you have to store the DDL somewhere, or you have to generate it from some other format which is semantically reducible to the SQL DDL as supported by PostgreSQL. Quite frankly if we are going that route, we should be using SQL DDL instead. Hmm. Is there any way in Postgres to define column ordering via an alter statement? This is something MySQL supports ;-) Nope: http://wiki.postgresql.org/wiki/Alter_column_position Apparently nobody wants this badly enough to actually do the work required. This might be an issue for developers, but for regular users, I think we just need to make sure we don't change the update scripts after creating them -- as long as people follow the upgrade procedures, wouldn't new columns get added to the end of the table? Or does this vary by Pg version? Do we want to be limited to always adding new fields to the end of the table? I would rather have the base DDL files focused on clarity for developers if given the choice because it means fewer errors and omissions. Another issue that we'd run into has to do with self-joins. In general the backup/restore processes from PostgreSQL are only guaranteed to work in the event you dump both the schema and data together in one file. This is an issue for tables like menu_node. It's certainly possible to get around this problem (by dropping the fkey, loading the data, recreating it) but it means additional effort and complexity and I think we'd have to weigh the benefit of doing so against the benefit of using well tested tools to dump data in a reloadable format rather than maintaining that by hand. So I guess I think we should start from a few more basic questions: Now that the menus should be pretty stable for the duration of 1.3 (absent bugs which you'd probably want flagged), what would you expect such an output to look like that you can't get from a diff on various versions of Pg-database? I mean, what specifically is missing? Table names? Something else? Or is it a matter of noise from comparing dumps? Best Wishes, Chris Travers -- AppSumo Presents a FREE Video for the SourceForge Community by Eric Ries, the creator of the Lean Startup Methodology on Lean Startup Secrets Revealed. This video shows you how to validate your ideas, optimize your ideas and identify your business strategy. http://p.sf.net/sfu/appsumosfdev2dev ___ Ledger-smb-devel mailing list Ledger-smb-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ledger-smb-devel
Re: [Ledger-smb-devel] Database updates
On 07/15/2011 12:04 PM, Chris Travers wrote: Now that the menus should be pretty stable for the duration of 1.3 (absent bugs which you'd probably want flagged), what would you expect such an output to look like that you can't get from a diff on various versions of Pg-database? I mean, what specifically is missing? Table names? Something else? Or is it a matter of noise from comparing dumps? It's a matter of comparing the actual schema in the database with the codebase. I'm looking for a quick way to tell if the failed upgrade scripts have left my database schema in the old state or was just noisy about success, without having to go table-by-table to compare manually against Pg-database.sql. Sure, it's easy to compare different versions of Pg-database.sql. But I recently skipped forward 2 weeks on trunk, ran all the new upgrade scripts, many reported errors, and there were several hundred lines of diffs in Pg_database.sql. I really do not want to spend hours trying to get my schema to match, figuring out which upgrades failed, and which schema changes were not even in an upgrade script -- I'd much sooner spend those hours building a tool that will tell me at any point in the future where my schema does not match the code. I can't be alone in this... So I'm suggesting: 1. Schema index file, parsed by Perl/Bash/whatever 2. base schema sql file, auto-generated from a clean install by a script that parses #1 3. base sql data file, to populate menu tables, etc. So how would this work exactly? Are you thinking one DDL file per table? I guess I am wondering if this would create more maintenance problems than it would solve. I mean you have to store the DDL somewhere, or you have to generate it from some other format which is semantically reducible to the SQL DDL as supported by PostgreSQL. Quite frankly if we are going that route, we should be using SQL DDL instead. No, I'm thinking one file for all of the tables, in a pre-determined order. I want a quick way to tell if the schema I have in the database matches what LSMB expects. If I have extra tables in the database, what I have in mind would simply ignore them -- making it so you can have add-ons without breaking this... I'm looking for an automated way of creating a Pg-database.sql file so I can compare it at any time against any (future) version of LSMB, and see what schema alterations are necessary. I already can compare two different versions of Pg-database.sql -- but I don't know how to quickly tell how my installed schema matches up, without going table-by-table. I'm assuming by DDL you mean the SQL statements to create a table? Or some other format? I understand SQL, so that would be my preference, if you're talking about some other format of DDL... Hmm. Is there any way in Postgres to define column ordering via an alter statement? This is something MySQL supports ;-) Nope: http://wiki.postgresql.org/wiki/Alter_column_position Apparently nobody wants this badly enough to actually do the work required. This might be an issue for developers, but for regular users, I think we just need to make sure we don't change the update scripts after creating them -- as long as people follow the upgrade procedures, wouldn't new columns get added to the end of the table? Or does this vary by Pg version? Do we want to be limited to always adding new fields to the end of the table? I would rather have the base DDL files focused on clarity for developers if given the choice because it means fewer errors and omissions. Ok, well, as a developer, I can handle it if diff shows me a column definition deleted in one line and added to another. I think we should take care to make upgrade scripts going forward work for less technical users, so that column sequence either matches and doesn't trigger a diff, or we provide a more sophisticated tool that understands the DDL. (Personally I'd suggest let's start with diff and just add columns to the end of the table for minor releases, and then perhaps refactor column orders at major releases). Another issue that we'd run into has to do with self-joins. In general the backup/restore processes from PostgreSQL are only guaranteed to work in the event you dump both the schema and data together in one file. This is an issue for tables like menu_node. It's certainly possible to get around this problem (by dropping the fkey, loading the data, recreating it) but it means additional effort and complexity and I think we'd have to weigh the benefit of doing so against the benefit of using well tested tools to dump data in a reloadable format rather than maintaining that by hand. Ok. I've been using pg_dump --format=c/pg_restore -c for backups -- but that's a different problem than comparing the schema... So I guess I think we should start from a few more basic questions: Now that the menus should be pretty stable for the duration of 1.3 (absent bugs which you'd probably
Re: [Ledger-smb-devel] Database updates
On Fri, Jul 15, 2011 at 3:18 PM, John Locke m...@freelock.com wrote: So I'm suggesting: 1. Schema index file, parsed by Perl/Bash/whatever 2. base schema sql file, auto-generated from a clean install by a script that parses #1 3. base sql data file, to populate menu tables, etc. So how would this work exactly? Are you thinking one DDL file per table? I guess I am wondering if this would create more maintenance problems than it would solve. I mean you have to store the DDL somewhere, or you have to generate it from some other format which is semantically reducible to the SQL DDL as supported by PostgreSQL. Quite frankly if we are going that route, we should be using SQL DDL instead. No, I'm thinking one file for all of the tables, in a pre-determined order. I want a quick way to tell if the schema I have in the database matches what LSMB expects. If I have extra tables in the database, what I have in mind would simply ignore them -- making it so you can have add-ons without breaking this... Ok, so if one file for all tables, wouldn't the index have to be generated from the base table script and not the other way around? If so, what's wrong with generating it from the Pg-database.sql? I can't imagine the index file has enough info to generate the schema file by itself, or you wouldn't need an index. There may be something else available right now that can help. The current test rig contains a large number of database tests which do not run by default but are production safe. If I run: PGUSER=postgres PGPASSWORD='foo' PGDATABASE='lsmb13test6' LSMB_TEST_DB=1 make test against an out of date test db, I get: Test Summary Report --- t/43-dbtest.t (Wstat: 1024 Tests: 148 Failed: 4) Failed tests: 54, 97, 104-105 Non-zero exit status: 4 Files=18, Tests=2563, 20 wallclock secs ( 0.54 usr 0.05 sys + 13.31 cusr 0.93 csys = 14.83 CPU) Result: FAIL Failed 1/18 test programs. 4/2563 subtests failed. make: *** [test_dynamic] Error 255 Just a note here on test safety: All tests are safe on production with the exception of the LWP test rig which has additional safeties that must be disabled before they can be run (i.e. doesn't run unless you really really really tell it to). We have layers of safety here to ensure test data doesn't pollute a production database including the fact that all scripts run in transactions and they never commit, meaning that all inserts etc. roll back. These run mock-live tests against a production database and flag misbehaviors or problems found. In addition to looking at schemas, they also look at, for example, stored procedure overloading, table permissions (albeit in a cursory way), etc. More information about how to run test cases is found in the README.tests. My own preference would be to try to build additional tests into the database testing routines if there is a need to do so. After all, if we are looking for a way to test,we should probably test automatically. Of course this involves highly recommending test rigs as a support tool. If I were trying to do this what I would do is: 1) Do a clean load into a designated test database environment. 2) Connect to both databases using psql, list all tables and types in the public schema 3) List schema for each table, parsing the psql output and sorting consistently, and then comparing. 4) Each comparison would be a test case. Tables only in production db would be ignored. Tables only in the test base db would be failures. 5) Clean up the test db afterwards. 1 and 5 could be accomplished by adding an interface to the relevant test scripts for db creation and teardown. The others would have to involve another test script of some sort. This is a bit of work. So my first question is: are the test rigs themselves (which test stored procedure behavior, function overloading, and some some permissions issues) sufficiently inadequate to justify putting other stuff on hold while coding something like this. One caveat about the test framework is that some things will likely not have test cases written until they are nearing completion for the reason that desired behavior in the lower levels of a program can change a fair bit between first sitting down and starting to work on something to get it ready to put into production. I'm looking for an automated way of creating a Pg-database.sql file so I can compare it at any time against any (future) version of LSMB, and see what schema alterations are necessary. I already can compare two different versions of Pg-database.sql -- but I don't know how to quickly tell how my installed schema matches up, without going table-by-table. pg_dump -s [database_name] will get you most of the way there (it's a backup of the database structure). Note however, you'd want to compare against a similar dump taken from a new installation for the reason that it will have a lot of stuff from contrib that's not in the original file.
Re: [Ledger-smb-devel] Database updates
Hi, On 07/13/2011 03:15 PM, Chris Travers wrote: Ok, two pieces of quick feedback. First right now I would highly recommend that if you have to follow schema changes you join the ledger-smb-commits list on sourceforge. That will help a lot right there. Well, with git I'm not having any trouble seeing what's changed. What I do: 1. create a tag of my current local branch 2. merge changes from upstream 3. git diff tagname sql/ and get all the schema changes. Problem is, there have been quite a few! And so I've just been relying on the upgrade scripts -- but some of these have failed, and I don't think they're complete. Secondly, a tool to compare schemas will be helpful, and I would like to create one in the near future. in the mean time, as a simple discussion draft, let me post the following bash code. rm ../testdump1.sql ../testdump2.sql for line in `grep -i ^create table sql/Pg-database.sql | sed -e 's/(.*$//' | awk '{ print $3 }'` do pg_dump -U postgres -s -t $line database1 | grep -v GRANT | grep -v REVOKE ../testdump1.sql done for line in `grep -i ^create table sql/Pg-database.sql | sed -e 's/(.*$//' | awk '{ print $3 }'` do pg_dump -U postgres -s -t $line database2 | grep -v GRANT | grep -v REVOKE ../testdump2.sql done diff ../testdump1.sql ../testdump2.sql If this sort of result is found to be acceptable, I could probably port the first two parts to Perl pretty easily. Not sure about diffs though. That sounds like a good start. I'm thinking, though, that Pg-database.sql is still too big and monolithic for this -- and the resulting dump file ought to be where you do the initial schema load, don't you think? Maybe have a separate file, sort of like an ini file or parsable loadorder that can identify the type of relation and the name. e.g. table acc_trans table account type _xx_xx view chart ... etc. This could then be fed to the dump script, instead of parsing Pg-database.sql... Then we can have the tables ordered in a way that will actually load with foreign keys to tables previously defined, etc. And as long as that's consistent, then I think diff is fine for now for seeing what's changed... can just run the dump script, direct the output to a tmp file, and compare against the one in the code tree. Cheers, John Locke http://www.freelock.com -- AppSumo Presents a FREE Video for the SourceForge Community by Eric Ries, the creator of the Lean Startup Methodology on Lean Startup Secrets Revealed. This video shows you how to validate your ideas, optimize your ideas and identify your business strategy. http://p.sf.net/sfu/appsumosfdev2dev ___ Ledger-smb-devel mailing list Ledger-smb-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ledger-smb-devel
Re: [Ledger-smb-devel] Database updates
On Thu, Jul 14, 2011 at 2:31 PM, John Locke m...@freelock.com wrote: Hi, On 07/13/2011 03:15 PM, Chris Travers wrote: Ok, two pieces of quick feedback. First right now I would highly recommend that if you have to follow schema changes you join the ledger-smb-commits list on sourceforge. That will help a lot right there. Well, with git I'm not having any trouble seeing what's changed. What I do: 1. create a tag of my current local branch 2. merge changes from upstream 3. git diff tagname sql/ and get all the schema changes. Problem is, there have been quite a few! And so I've just been relying on the upgrade scripts -- but some of these have failed, and I don't think they're complete. Secondly, a tool to compare schemas will be helpful, and I would like to create one in the near future. in the mean time, as a simple discussion draft, let me post the following bash code. rm ../testdump1.sql ../testdump2.sql for line in `grep -i ^create table sql/Pg-database.sql | sed -e 's/(.*$//' | awk '{ print $3 }'` do pg_dump -U postgres -s -t $line database1 | grep -v GRANT | grep -v REVOKE ../testdump1.sql done for line in `grep -i ^create table sql/Pg-database.sql | sed -e 's/(.*$//' | awk '{ print $3 }'` do pg_dump -U postgres -s -t $line database2 | grep -v GRANT | grep -v REVOKE ../testdump2.sql done diff ../testdump1.sql ../testdump2.sql If this sort of result is found to be acceptable, I could probably port the first two parts to Perl pretty easily. Not sure about diffs though. That sounds like a good start. I'm thinking, though, that Pg-database.sql is still too big and monolithic for this -- and the resulting dump file ought to be where you do the initial schema load, don't you think? Ok, so if we are dealing with a single database server, we could I suppose do a load, and then dump schema dump and then compare against a schema dump from production (filtering out set commands and comments) That won't give you everything For example menu changes.. But with menu changes, if you have any custom entries (which are often created by addons), you can't simply import the different records, so the update scripts have to be relied on there. And typically the way this works is to run the update scripts ona new database, then dump the tables and copy them back into the Pg-database. Maybe have a separate file, sort of like an ini file or parsable loadorder that can identify the type of relation and the name. e.g. I would REALLY like to avoid having multiple authoritative sources of the database schema if possible. It's very easy for those to get out of sync. table acc_trans table account type _xx_xx view chart ... etc. This could then be fed to the dump script, instead of parsing Pg-database.sql... If that's what we want., I think we'd want to automatically generate it from the SQL files in question rather than maintain it separately. Then we can have the tables ordered in a way that will actually load with foreign keys to tables previously defined, etc. And as long as that's consistent, then I think diff is fine for now for seeing what's changed... can just run the dump script, direct the output to a tmp file, and compare against the one in the code tree. Also there is a second issue here.. Column ordering is not guaranteed when alter table commands are used. So no matter how we do it, there will be false positives that have to be reviewed and dismissed. Best wishes, Chris Travers -- AppSumo Presents a FREE Video for the SourceForge Community by Eric Ries, the creator of the Lean Startup Methodology on Lean Startup Secrets Revealed. This video shows you how to validate your ideas, optimize your ideas and identify your business strategy. http://p.sf.net/sfu/appsumosfdev2dev ___ Ledger-smb-devel mailing list Ledger-smb-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ledger-smb-devel
[Ledger-smb-devel] Database updates
Hi, Couple issues/thoughts. 1. Errors on upgrade sql 2. Normalize/compare installed schema vs files? 1. Just updating my instance using the upgrade sql files, and got these errors: freelockco=# \i sql/upgrade/3497-schema-changes.sql psql:sql/upgrade/3497-schema-changes.sql:1: ERROR: syntax error at or near trans_id LINE 1: ALTER TABLE invoice ADD FOREIGN KEY trans_id REFERENCES tran... ^ psql:sql/upgrade/3497-schema-changes.sql:2: ERROR: syntax error at or near parts_id LINE 1: ALTER TABLE invoice ADD FOREIGN KEY parts_id REFERENCES part... ^ psql:sql/upgrade/3497-schema-changes.sql:4: ERROR: syntax error at or near chart_id LINE 1: ALTER TABLE tax ADD FOREIGN KEY chart_id REFERENCES account(... ^ psql:sql/upgrade/3497-schema-changes.sql:5: ERROR: syntax error at or near ; LINE 1: ...IGGER ap_audit_trail AFTER insert or update or delete ON ap; ^ ... I'm using postgresql 8.3.12, if it makes any difference. I did see some similar errors a few days ago, when updating trunk from a ~2 week old version. Which brings me to part 2: 2. What would be the best way to get a consistent, repeatable, text dump of the table schema only, along with views, types, and triggers? I'm thinking back to an earlier discussion about database updates, and having a shell tool to manage them. I'm thinking we should be able to dump the structural parts of the database to a single file (e.g. Pg-database.sql) that we can then easily compare to what's in the code base. I assume this would mean splitting out the functions and the data that's currently loaded into Pg-database.sql into separate files -- but once that's done, it should be trivial to see how close the structure of your database is to the current release, and what needs to be altered to get it there. Which would be the first step to auto-generating update scripts. Thoughts? Cheers, John Locke http://www.freelock.com -- AppSumo Presents a FREE Video for the SourceForge Community by Eric Ries, the creator of the Lean Startup Methodology on Lean Startup Secrets Revealed. This video shows you how to validate your ideas, optimize your ideas and identify your business strategy. http://p.sf.net/sfu/appsumosfdev2dev ___ Ledger-smb-devel mailing list Ledger-smb-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ledger-smb-devel
Re: [Ledger-smb-devel] Database updates
On Wed, Jul 13, 2011 at 2:04 PM, John Locke m...@freelock.com wrote: Hi, Couple issues/thoughts. 1. Errors on upgrade sql Those were reported to me this am. I corrected those in svn rev 3500. Exhibit A on why never to commit during short intervals at airports. Sorry for the inconvenience. 2. Normalize/compare installed schema vs files? 1. Just updating my instance using the upgrade sql files, and got these errors: freelockco=# \i sql/upgrade/3497-schema-changes.sql psql:sql/upgrade/3497-schema-changes.sql:1: ERROR: syntax error at or near trans_id LINE 1: ALTER TABLE invoice ADD FOREIGN KEY trans_id REFERENCES tran... ^ psql:sql/upgrade/3497-schema-changes.sql:2: ERROR: syntax error at or near parts_id LINE 1: ALTER TABLE invoice ADD FOREIGN KEY parts_id REFERENCES part... ^ psql:sql/upgrade/3497-schema-changes.sql:4: ERROR: syntax error at or near chart_id LINE 1: ALTER TABLE tax ADD FOREIGN KEY chart_id REFERENCES account(... ^ psql:sql/upgrade/3497-schema-changes.sql:5: ERROR: syntax error at or near ; LINE 1: ...IGGER ap_audit_trail AFTER insert or update or delete ON ap; ^ ... I'm using postgresql 8.3.12, if it makes any difference. I did see some similar errors a few days ago, when updating trunk from a ~2 week old version. Which brings me to part 2: 2. What would be the best way to get a consistent, repeatable, text dump of the table schema only, along with views, types, and triggers? I'm thinking back to an earlier discussion about database updates, and having a shell tool to manage them. I'm thinking we should be able to dump the structural parts of the database to a single file (e.g. Pg-database.sql) that we can then easily compare to what's in the code base. I assume this would mean splitting out the functions and the data that's currently loaded into Pg-database.sql into separate files -- but once that's done, it should be trivial to see how close the structure of your database is to the current release, and what needs to be altered to get it there. Which would be the first step to auto-generating update scripts. Thoughts? Let me think about how best to do this. It's slightly complicated by the presence of the external dependencies from pg_contrib. Ideally I'd like to have something that's easily repeatable and generated from an authoritative source. I have an idea. Will try this later today. Best Wishes, Chris Travers Cheers, John Locke http://www.freelock.com -- AppSumo Presents a FREE Video for the SourceForge Community by Eric Ries, the creator of the Lean Startup Methodology on Lean Startup Secrets Revealed. This video shows you how to validate your ideas, optimize your ideas and identify your business strategy. http://p.sf.net/sfu/appsumosfdev2dev ___ Ledger-smb-devel mailing list Ledger-smb-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ledger-smb-devel -- AppSumo Presents a FREE Video for the SourceForge Community by Eric Ries, the creator of the Lean Startup Methodology on Lean Startup Secrets Revealed. This video shows you how to validate your ideas, optimize your ideas and identify your business strategy. http://p.sf.net/sfu/appsumosfdev2dev ___ Ledger-smb-devel mailing list Ledger-smb-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ledger-smb-devel
Re: [Ledger-smb-devel] Database updates
Ok, two pieces of quick feedback. First right now I would highly recommend that if you have to follow schema changes you join the ledger-smb-commits list on sourceforge. That will help a lot right there. Secondly, a tool to compare schemas will be helpful, and I would like to create one in the near future. in the mean time, as a simple discussion draft, let me post the following bash code. rm ../testdump1.sql ../testdump2.sql for line in `grep -i ^create table sql/Pg-database.sql | sed -e 's/(.*$//' | awk '{ print $3 }'` do pg_dump -U postgres -s -t $line database1 | grep -v GRANT | grep -v REVOKE ../testdump1.sql done for line in `grep -i ^create table sql/Pg-database.sql | sed -e 's/(.*$//' | awk '{ print $3 }'` do pg_dump -U postgres -s -t $line database2 | grep -v GRANT | grep -v REVOKE ../testdump2.sql done diff ../testdump1.sql ../testdump2.sql If this sort of result is found to be acceptable, I could probably port the first two parts to Perl pretty easily. Not sure about diffs though. Best Wishes, Chris Travers -- AppSumo Presents a FREE Video for the SourceForge Community by Eric Ries, the creator of the Lean Startup Methodology on Lean Startup Secrets Revealed. This video shows you how to validate your ideas, optimize your ideas and identify your business strategy. http://p.sf.net/sfu/appsumosfdev2dev ___ Ledger-smb-devel mailing list Ledger-smb-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ledger-smb-devel