Re: [Ledger-smb-devel] Database updates

2011-07-15 Thread Chris Travers
On Fri, Jul 15, 2011 at 10:03 AM, John Locke m...@freelock.com wrote:

 Ok, so if we are dealing with a single database server, we could I
 suppose do a load, and then dump schema dump
 and then compare against a schema dump from production (filtering out
 set commands and comments)

 Is that a built-in PG command?

Yeah.



 I'm not suggesting storing the schema in such a file, just a list of
 tables in an order they can be created that will load successfully
 (taking into account foreign keys, types, etc).

snip

 Yes, that's the whole point of having a separate file listing this stuff
 -- so the base schema can be generated automatically. And re-generated
 to easily compare against what it should be.

 Essentially that file replaces Pg-database.sql. But we need to keep the
 data out of it, because that data may get changed after it's already in
 the database, making the schema dump inconsistent/not-comparable.

 So I'm suggesting:

 1. Schema index file, parsed by Perl/Bash/whatever
 2. base schema sql file, auto-generated from a clean install by a script
 that parses #1
 3. base sql data file, to populate menu tables, etc.

So how would this work exactly?  Are you thinking one DDL file per
table?  I guess I am wondering if this would create more maintenance
problems than it would solve.  I mean you have to store the DDL
somewhere, or you have to generate it from some other format which is
semantically reducible to the SQL DDL as supported by PostgreSQL.
Quite frankly if we are going that route, we should be using SQL DDL
instead.



 Hmm. Is there any way in Postgres to define column ordering via an alter
 statement? This is something MySQL supports ;-)

Nope:
http://wiki.postgresql.org/wiki/Alter_column_position

Apparently nobody wants this badly enough to actually do the work required.


 This might be an issue for developers, but for regular users, I think we
 just need to make sure we don't change the update scripts after creating
 them -- as long as people follow the upgrade procedures, wouldn't new
 columns get added to the end of the table? Or does this vary by Pg version?

Do we want to be limited to always adding new fields to the end of the
table?  I would rather have the base DDL files focused on clarity for
developers if given the choice because it means fewer errors and
omissions.

Another issue that we'd run into has to do with self-joins.  In
general the backup/restore processes from PostgreSQL are only
guaranteed to work in the event you dump both the schema and data
together in one file.  This is an issue for tables like menu_node.
It's certainly possible to get around this problem (by dropping the
fkey, loading the data, recreating it) but it means additional effort
and complexity and I think we'd have to weigh the benefit of doing so
against the benefit of using well tested tools to dump data in a
reloadable format rather than maintaining that by hand.

So I guess I think we should start from a few more basic questions:

Now that the menus should be pretty stable for the duration of 1.3
(absent bugs which you'd probably want flagged), what would you expect
such an output to look like that you can't get from a diff on various
versions of Pg-database?   I mean, what specifically is missing?
Table names?  Something else?  Or is it a matter of noise from
comparing dumps?

Best Wishes,
Chris Travers

--
AppSumo Presents a FREE Video for the SourceForge Community by Eric 
Ries, the creator of the Lean Startup Methodology on Lean Startup 
Secrets Revealed. This video shows you how to validate your ideas, 
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
___
Ledger-smb-devel mailing list
Ledger-smb-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ledger-smb-devel


Re: [Ledger-smb-devel] Database updates

2011-07-15 Thread John Locke
On 07/15/2011 12:04 PM, Chris Travers wrote:

 Now that the menus should be pretty stable for the duration of 1.3
 (absent bugs which you'd probably want flagged), what would you expect
 such an output to look like that you can't get from a diff on various
 versions of Pg-database?   I mean, what specifically is missing?
 Table names?  Something else?  Or is it a matter of noise from
 comparing dumps?

It's a matter of comparing the actual schema in the database with the
codebase. I'm looking for a quick way to tell if the failed upgrade
scripts have left my database schema in the old state or was just noisy
about success, without having to go table-by-table to compare manually
against Pg-database.sql.

Sure, it's easy to compare different versions of Pg-database.sql. But I
recently skipped forward 2 weeks on trunk, ran all the new upgrade
scripts, many reported errors, and there were several hundred lines of
diffs in Pg_database.sql. I really do not want to spend hours trying to
get my schema to match, figuring out which upgrades failed, and which
schema changes were not even in an upgrade script -- I'd much sooner
spend those hours building a tool that will tell me at any point in the
future where my schema does not match the code.

I can't be alone in this...

 So I'm suggesting:

 1. Schema index file, parsed by Perl/Bash/whatever
 2. base schema sql file, auto-generated from a clean install by a script
 that parses #1
 3. base sql data file, to populate menu tables, etc.
 So how would this work exactly?  Are you thinking one DDL file per
 table?  I guess I am wondering if this would create more maintenance
 problems than it would solve.  I mean you have to store the DDL
 somewhere, or you have to generate it from some other format which is
 semantically reducible to the SQL DDL as supported by PostgreSQL.
 Quite frankly if we are going that route, we should be using SQL DDL
 instead.



No, I'm thinking one file for all of the tables, in a pre-determined
order. I want a quick way to tell if the schema I have in the database
matches what LSMB expects. If I have extra tables in the database, what
I have in mind would simply ignore them -- making it so you can have
add-ons without breaking this...

I'm looking for an automated way of creating a Pg-database.sql file so I
can compare it at any time against any (future) version of LSMB, and see
what schema alterations are necessary. I already can compare two
different versions of Pg-database.sql -- but I don't know how to quickly
tell how my installed schema matches up, without going table-by-table.

I'm assuming by DDL you mean the SQL statements to create a table? Or
some other format? I understand SQL, so that would be my preference, if
you're talking about some other format of DDL...


 Hmm. Is there any way in Postgres to define column ordering via an alter
 statement? This is something MySQL supports ;-)
 Nope:
 http://wiki.postgresql.org/wiki/Alter_column_position

 Apparently nobody wants this badly enough to actually do the work required.

 This might be an issue for developers, but for regular users, I think we
 just need to make sure we don't change the update scripts after creating
 them -- as long as people follow the upgrade procedures, wouldn't new
 columns get added to the end of the table? Or does this vary by Pg version?
 Do we want to be limited to always adding new fields to the end of the
 table?  I would rather have the base DDL files focused on clarity for
 developers if given the choice because it means fewer errors and
 omissions.

Ok, well, as a developer, I can handle it if diff shows me a column
definition deleted in one line and added to another. I think we should
take care to make upgrade scripts going forward work for less technical
users, so that column sequence either matches and doesn't trigger a
diff, or we provide a more sophisticated tool that understands the DDL.
(Personally I'd suggest let's start with diff and just add columns to
the end of the table for minor releases, and then perhaps refactor
column orders at major releases).


 Another issue that we'd run into has to do with self-joins.  In
 general the backup/restore processes from PostgreSQL are only
 guaranteed to work in the event you dump both the schema and data
 together in one file.  This is an issue for tables like menu_node.
 It's certainly possible to get around this problem (by dropping the
 fkey, loading the data, recreating it) but it means additional effort
 and complexity and I think we'd have to weigh the benefit of doing so
 against the benefit of using well tested tools to dump data in a
 reloadable format rather than maintaining that by hand.

Ok. I've been using pg_dump --format=c/pg_restore -c for backups -- but
that's a different problem than comparing the schema...


 So I guess I think we should start from a few more basic questions:

 Now that the menus should be pretty stable for the duration of 1.3
 (absent bugs which you'd probably 

Re: [Ledger-smb-devel] Database updates

2011-07-15 Thread Chris Travers
On Fri, Jul 15, 2011 at 3:18 PM, John Locke m...@freelock.com wrote:

 So I'm suggesting:

 1. Schema index file, parsed by Perl/Bash/whatever
 2. base schema sql file, auto-generated from a clean install by a script
 that parses #1
 3. base sql data file, to populate menu tables, etc.
 So how would this work exactly?  Are you thinking one DDL file per
 table?  I guess I am wondering if this would create more maintenance
 problems than it would solve.  I mean you have to store the DDL
 somewhere, or you have to generate it from some other format which is
 semantically reducible to the SQL DDL as supported by PostgreSQL.
 Quite frankly if we are going that route, we should be using SQL DDL
 instead.



 No, I'm thinking one file for all of the tables, in a pre-determined
 order. I want a quick way to tell if the schema I have in the database
 matches what LSMB expects. If I have extra tables in the database, what
 I have in mind would simply ignore them -- making it so you can have
 add-ons without breaking this...


Ok, so if one file for all tables, wouldn't the index have to be
generated from the base table script and not the other way around?  If
so, what's wrong with generating it from the Pg-database.sql?  I can't
imagine the index file has enough info to generate the schema file by
itself, or you wouldn't need an index.

There may be something else available right now that can help.  The
current test rig contains a large number of database tests which do
not run by default but are production safe.

If I run:

PGUSER=postgres PGPASSWORD='foo' PGDATABASE='lsmb13test6'
LSMB_TEST_DB=1 make test

against an out of date test db, I get:

Test Summary Report
---
t/43-dbtest.t   (Wstat: 1024 Tests: 148 Failed: 4)
  Failed tests:  54, 97, 104-105
  Non-zero exit status: 4
Files=18, Tests=2563, 20 wallclock secs ( 0.54 usr  0.05 sys + 13.31
cusr  0.93 csys = 14.83 CPU)
Result: FAIL
Failed 1/18 test programs. 4/2563 subtests failed.
make: *** [test_dynamic] Error 255

Just a note here on test safety:  All tests are safe on production
with the exception of the LWP test rig which has additional safeties
that must be disabled before they can be run (i.e. doesn't run unless
you really really really tell it to).  We have layers of safety here
to ensure test data doesn't pollute a production database including
the fact that all scripts run in transactions and they never commit,
meaning that all inserts etc. roll back.  These run mock-live tests
against a production database and flag misbehaviors or problems found.
 In addition to looking at schemas, they also look at, for example,
stored procedure overloading, table permissions (albeit in a cursory
way), etc.  More information about how to run test cases is found in
the README.tests.

My own preference would be to try to build additional tests into the
database testing routines if there is a need to do so.  After all, if
we are looking for a way to test,we should probably test
automatically.  Of course this involves highly recommending test rigs
as a support tool.

If I were trying to do this what I would do is:
1)  Do a clean load into a designated test database environment.
2)  Connect to both databases using psql, list all tables and types in
the public schema
3)  List schema for each table, parsing the psql output and sorting
consistently, and then comparing.
4)  Each comparison would be a test case.  Tables only in production
db would be ignored.  Tables only in the test base db would be
failures.
5)  Clean up the test db afterwards.

1 and 5 could be accomplished by adding an interface to the relevant
test scripts for db creation and teardown.  The others would have to
involve another test script of some sort.  This is a bit of work.  So
my first question is:  are the test rigs themselves (which test stored
procedure behavior, function overloading, and some some permissions
issues) sufficiently inadequate to justify putting other stuff on hold
while coding something like this.  One caveat about the test framework
is that some things will likely not have test cases written until they
are nearing completion for the reason that desired behavior in the
lower levels of a program can change a fair bit between first sitting
down and starting to work on something to get it ready to put into
production.


 I'm looking for an automated way of creating a Pg-database.sql file so I
 can compare it at any time against any (future) version of LSMB, and see
 what schema alterations are necessary. I already can compare two
 different versions of Pg-database.sql -- but I don't know how to quickly
 tell how my installed schema matches up, without going table-by-table.

pg_dump -s [database_name] will get you most of the way there (it's a
backup of the database structure).  Note however, you'd want to
compare against a similar dump taken from a new installation for the
reason that it will have a lot of stuff from contrib that's not in the
original file.

[Ledger-smb-devel] Existing test runs, was re: Database updates

2011-07-15 Thread Chris Travers
Just as another note on test safety.

The following are listed in decreasing orders of safety:

1)  Doesn't touch the database
make test
2)  Touches the database, but is never intended to commit.  Direct sql
scripts partition data using negative id's to the extent possible in
case of accidental commit
PGUSER=postgres PGPASSWORD='foo' PGDATABASE='lsmb13test6'
LSMB_TEST_DB=1 make test

3)  Has a number of safety features built in but errors in the code
COULD potentially drop a database that currently exists if they exist
(some recently corrected).  Not recommended in production
environments.  Runs the same tests as above, but against a new
database freshly loaded from Pg-database and sql/modules before
cleaning itself up.

PGPASSWORD='foo'  LSMB_NEW_DB='foo' PGUSER=postgres LSMB_TEST_DB=1
PG_CONTRIB_DIR=/usr/share/pgsql/contrib/ LSMB_COUNTRY_CODE='us'
LSMB_LOAD_COA='UCOA-Form990' LSMB_ADMIN_USERNAME=test123
LSMB_ADMIN_PASSWORD=testing LSMB_ADMIN_FNAME='Test'
LSMB_ADMIN_LNAME='Test' LSMB_LOAD_GIFI=UCOA-Form990 make test

4)  May write to the db.  DO NOT RUN AGAINST PRODUCTION
LSMB_USER='foo' LSMB_PASS='bar' PGDATABASE='baz'
LSMB_BASE_URL='http://localhost/ledgersmb_13/' LSMB_TEST_LWP=1 make
test

Best Wishes,
Chris Travers

--
AppSumo Presents a FREE Video for the SourceForge Community by Eric 
Ries, the creator of the Lean Startup Methodology on Lean Startup 
Secrets Revealed. This video shows you how to validate your ideas, 
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
___
Ledger-smb-devel mailing list
Ledger-smb-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ledger-smb-devel