Re: [Ledger-smb-devel] Database updates
On Fri, Jul 15, 2011 at 10:03 AM, John Locke m...@freelock.com wrote: Ok, so if we are dealing with a single database server, we could I suppose do a load, and then dump schema dump and then compare against a schema dump from production (filtering out set commands and comments) Is that a built-in PG command? Yeah. I'm not suggesting storing the schema in such a file, just a list of tables in an order they can be created that will load successfully (taking into account foreign keys, types, etc). snip Yes, that's the whole point of having a separate file listing this stuff -- so the base schema can be generated automatically. And re-generated to easily compare against what it should be. Essentially that file replaces Pg-database.sql. But we need to keep the data out of it, because that data may get changed after it's already in the database, making the schema dump inconsistent/not-comparable. So I'm suggesting: 1. Schema index file, parsed by Perl/Bash/whatever 2. base schema sql file, auto-generated from a clean install by a script that parses #1 3. base sql data file, to populate menu tables, etc. So how would this work exactly? Are you thinking one DDL file per table? I guess I am wondering if this would create more maintenance problems than it would solve. I mean you have to store the DDL somewhere, or you have to generate it from some other format which is semantically reducible to the SQL DDL as supported by PostgreSQL. Quite frankly if we are going that route, we should be using SQL DDL instead. Hmm. Is there any way in Postgres to define column ordering via an alter statement? This is something MySQL supports ;-) Nope: http://wiki.postgresql.org/wiki/Alter_column_position Apparently nobody wants this badly enough to actually do the work required. This might be an issue for developers, but for regular users, I think we just need to make sure we don't change the update scripts after creating them -- as long as people follow the upgrade procedures, wouldn't new columns get added to the end of the table? Or does this vary by Pg version? Do we want to be limited to always adding new fields to the end of the table? I would rather have the base DDL files focused on clarity for developers if given the choice because it means fewer errors and omissions. Another issue that we'd run into has to do with self-joins. In general the backup/restore processes from PostgreSQL are only guaranteed to work in the event you dump both the schema and data together in one file. This is an issue for tables like menu_node. It's certainly possible to get around this problem (by dropping the fkey, loading the data, recreating it) but it means additional effort and complexity and I think we'd have to weigh the benefit of doing so against the benefit of using well tested tools to dump data in a reloadable format rather than maintaining that by hand. So I guess I think we should start from a few more basic questions: Now that the menus should be pretty stable for the duration of 1.3 (absent bugs which you'd probably want flagged), what would you expect such an output to look like that you can't get from a diff on various versions of Pg-database? I mean, what specifically is missing? Table names? Something else? Or is it a matter of noise from comparing dumps? Best Wishes, Chris Travers -- AppSumo Presents a FREE Video for the SourceForge Community by Eric Ries, the creator of the Lean Startup Methodology on Lean Startup Secrets Revealed. This video shows you how to validate your ideas, optimize your ideas and identify your business strategy. http://p.sf.net/sfu/appsumosfdev2dev ___ Ledger-smb-devel mailing list Ledger-smb-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ledger-smb-devel
Re: [Ledger-smb-devel] Database updates
On 07/15/2011 12:04 PM, Chris Travers wrote: Now that the menus should be pretty stable for the duration of 1.3 (absent bugs which you'd probably want flagged), what would you expect such an output to look like that you can't get from a diff on various versions of Pg-database? I mean, what specifically is missing? Table names? Something else? Or is it a matter of noise from comparing dumps? It's a matter of comparing the actual schema in the database with the codebase. I'm looking for a quick way to tell if the failed upgrade scripts have left my database schema in the old state or was just noisy about success, without having to go table-by-table to compare manually against Pg-database.sql. Sure, it's easy to compare different versions of Pg-database.sql. But I recently skipped forward 2 weeks on trunk, ran all the new upgrade scripts, many reported errors, and there were several hundred lines of diffs in Pg_database.sql. I really do not want to spend hours trying to get my schema to match, figuring out which upgrades failed, and which schema changes were not even in an upgrade script -- I'd much sooner spend those hours building a tool that will tell me at any point in the future where my schema does not match the code. I can't be alone in this... So I'm suggesting: 1. Schema index file, parsed by Perl/Bash/whatever 2. base schema sql file, auto-generated from a clean install by a script that parses #1 3. base sql data file, to populate menu tables, etc. So how would this work exactly? Are you thinking one DDL file per table? I guess I am wondering if this would create more maintenance problems than it would solve. I mean you have to store the DDL somewhere, or you have to generate it from some other format which is semantically reducible to the SQL DDL as supported by PostgreSQL. Quite frankly if we are going that route, we should be using SQL DDL instead. No, I'm thinking one file for all of the tables, in a pre-determined order. I want a quick way to tell if the schema I have in the database matches what LSMB expects. If I have extra tables in the database, what I have in mind would simply ignore them -- making it so you can have add-ons without breaking this... I'm looking for an automated way of creating a Pg-database.sql file so I can compare it at any time against any (future) version of LSMB, and see what schema alterations are necessary. I already can compare two different versions of Pg-database.sql -- but I don't know how to quickly tell how my installed schema matches up, without going table-by-table. I'm assuming by DDL you mean the SQL statements to create a table? Or some other format? I understand SQL, so that would be my preference, if you're talking about some other format of DDL... Hmm. Is there any way in Postgres to define column ordering via an alter statement? This is something MySQL supports ;-) Nope: http://wiki.postgresql.org/wiki/Alter_column_position Apparently nobody wants this badly enough to actually do the work required. This might be an issue for developers, but for regular users, I think we just need to make sure we don't change the update scripts after creating them -- as long as people follow the upgrade procedures, wouldn't new columns get added to the end of the table? Or does this vary by Pg version? Do we want to be limited to always adding new fields to the end of the table? I would rather have the base DDL files focused on clarity for developers if given the choice because it means fewer errors and omissions. Ok, well, as a developer, I can handle it if diff shows me a column definition deleted in one line and added to another. I think we should take care to make upgrade scripts going forward work for less technical users, so that column sequence either matches and doesn't trigger a diff, or we provide a more sophisticated tool that understands the DDL. (Personally I'd suggest let's start with diff and just add columns to the end of the table for minor releases, and then perhaps refactor column orders at major releases). Another issue that we'd run into has to do with self-joins. In general the backup/restore processes from PostgreSQL are only guaranteed to work in the event you dump both the schema and data together in one file. This is an issue for tables like menu_node. It's certainly possible to get around this problem (by dropping the fkey, loading the data, recreating it) but it means additional effort and complexity and I think we'd have to weigh the benefit of doing so against the benefit of using well tested tools to dump data in a reloadable format rather than maintaining that by hand. Ok. I've been using pg_dump --format=c/pg_restore -c for backups -- but that's a different problem than comparing the schema... So I guess I think we should start from a few more basic questions: Now that the menus should be pretty stable for the duration of 1.3 (absent bugs which you'd probably
Re: [Ledger-smb-devel] Database updates
On Fri, Jul 15, 2011 at 3:18 PM, John Locke m...@freelock.com wrote: So I'm suggesting: 1. Schema index file, parsed by Perl/Bash/whatever 2. base schema sql file, auto-generated from a clean install by a script that parses #1 3. base sql data file, to populate menu tables, etc. So how would this work exactly? Are you thinking one DDL file per table? I guess I am wondering if this would create more maintenance problems than it would solve. I mean you have to store the DDL somewhere, or you have to generate it from some other format which is semantically reducible to the SQL DDL as supported by PostgreSQL. Quite frankly if we are going that route, we should be using SQL DDL instead. No, I'm thinking one file for all of the tables, in a pre-determined order. I want a quick way to tell if the schema I have in the database matches what LSMB expects. If I have extra tables in the database, what I have in mind would simply ignore them -- making it so you can have add-ons without breaking this... Ok, so if one file for all tables, wouldn't the index have to be generated from the base table script and not the other way around? If so, what's wrong with generating it from the Pg-database.sql? I can't imagine the index file has enough info to generate the schema file by itself, or you wouldn't need an index. There may be something else available right now that can help. The current test rig contains a large number of database tests which do not run by default but are production safe. If I run: PGUSER=postgres PGPASSWORD='foo' PGDATABASE='lsmb13test6' LSMB_TEST_DB=1 make test against an out of date test db, I get: Test Summary Report --- t/43-dbtest.t (Wstat: 1024 Tests: 148 Failed: 4) Failed tests: 54, 97, 104-105 Non-zero exit status: 4 Files=18, Tests=2563, 20 wallclock secs ( 0.54 usr 0.05 sys + 13.31 cusr 0.93 csys = 14.83 CPU) Result: FAIL Failed 1/18 test programs. 4/2563 subtests failed. make: *** [test_dynamic] Error 255 Just a note here on test safety: All tests are safe on production with the exception of the LWP test rig which has additional safeties that must be disabled before they can be run (i.e. doesn't run unless you really really really tell it to). We have layers of safety here to ensure test data doesn't pollute a production database including the fact that all scripts run in transactions and they never commit, meaning that all inserts etc. roll back. These run mock-live tests against a production database and flag misbehaviors or problems found. In addition to looking at schemas, they also look at, for example, stored procedure overloading, table permissions (albeit in a cursory way), etc. More information about how to run test cases is found in the README.tests. My own preference would be to try to build additional tests into the database testing routines if there is a need to do so. After all, if we are looking for a way to test,we should probably test automatically. Of course this involves highly recommending test rigs as a support tool. If I were trying to do this what I would do is: 1) Do a clean load into a designated test database environment. 2) Connect to both databases using psql, list all tables and types in the public schema 3) List schema for each table, parsing the psql output and sorting consistently, and then comparing. 4) Each comparison would be a test case. Tables only in production db would be ignored. Tables only in the test base db would be failures. 5) Clean up the test db afterwards. 1 and 5 could be accomplished by adding an interface to the relevant test scripts for db creation and teardown. The others would have to involve another test script of some sort. This is a bit of work. So my first question is: are the test rigs themselves (which test stored procedure behavior, function overloading, and some some permissions issues) sufficiently inadequate to justify putting other stuff on hold while coding something like this. One caveat about the test framework is that some things will likely not have test cases written until they are nearing completion for the reason that desired behavior in the lower levels of a program can change a fair bit between first sitting down and starting to work on something to get it ready to put into production. I'm looking for an automated way of creating a Pg-database.sql file so I can compare it at any time against any (future) version of LSMB, and see what schema alterations are necessary. I already can compare two different versions of Pg-database.sql -- but I don't know how to quickly tell how my installed schema matches up, without going table-by-table. pg_dump -s [database_name] will get you most of the way there (it's a backup of the database structure). Note however, you'd want to compare against a similar dump taken from a new installation for the reason that it will have a lot of stuff from contrib that's not in the original file.
[Ledger-smb-devel] Existing test runs, was re: Database updates
Just as another note on test safety. The following are listed in decreasing orders of safety: 1) Doesn't touch the database make test 2) Touches the database, but is never intended to commit. Direct sql scripts partition data using negative id's to the extent possible in case of accidental commit PGUSER=postgres PGPASSWORD='foo' PGDATABASE='lsmb13test6' LSMB_TEST_DB=1 make test 3) Has a number of safety features built in but errors in the code COULD potentially drop a database that currently exists if they exist (some recently corrected). Not recommended in production environments. Runs the same tests as above, but against a new database freshly loaded from Pg-database and sql/modules before cleaning itself up. PGPASSWORD='foo' LSMB_NEW_DB='foo' PGUSER=postgres LSMB_TEST_DB=1 PG_CONTRIB_DIR=/usr/share/pgsql/contrib/ LSMB_COUNTRY_CODE='us' LSMB_LOAD_COA='UCOA-Form990' LSMB_ADMIN_USERNAME=test123 LSMB_ADMIN_PASSWORD=testing LSMB_ADMIN_FNAME='Test' LSMB_ADMIN_LNAME='Test' LSMB_LOAD_GIFI=UCOA-Form990 make test 4) May write to the db. DO NOT RUN AGAINST PRODUCTION LSMB_USER='foo' LSMB_PASS='bar' PGDATABASE='baz' LSMB_BASE_URL='http://localhost/ledgersmb_13/' LSMB_TEST_LWP=1 make test Best Wishes, Chris Travers -- AppSumo Presents a FREE Video for the SourceForge Community by Eric Ries, the creator of the Lean Startup Methodology on Lean Startup Secrets Revealed. This video shows you how to validate your ideas, optimize your ideas and identify your business strategy. http://p.sf.net/sfu/appsumosfdev2dev ___ Ledger-smb-devel mailing list Ledger-smb-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ledger-smb-devel