Re: [Ledger-smb-devel] Managing database change: improvements needed

Erik Huelsmann Fri, 11 Sep 2015 05:02:13 -0700

> Well, other than the standard Perl steps "copy to blib" and "manify POD",
>> we don't really have build steps of our own. I'm indeed proposing that we
>> have a build step. I envision it to be something like this:
>>
>> * Create PostgreSQL database
>> * Play Sqitch scripts
>> * Dump schema from the database
>>
>> and then consider the dumped schema a build artifact which doesn't need
>> version control. This step would be executed before creating a distribution
>> archive.
>>
>> The pre-distribution step should -if we go with Sqitch- also include
>> compilation of a Sqitch bundle, so we don't distribute development
>> progress, branches, or whatever that might be in the Sqitch repository
>> (which is - as far as I understand it - stored in the development tree).
>>
>
> I was assuming we would ship the schema.  So for a new db, no need for the
> end user to build from sqitch.  We could also use sqitch to generate
> migration files etc.  That way *we* need sqitch to get things going, but
> users don't need it.
>


Ok. My idea was (and still is) that we ship the schema for *new* databases,
but we'd ship a sqitch bundle for upgrades. How do you think we should
handle upgrades, if we don't require users to have at least part of sqitch
installed?


> Oh one downside to sqitch reading the docs is that it requires union merge
> on the sqitch.plan file so we may want to wait a little for this to be
> supported in Mercurial.  That would also provide an easily checkable
> intermediary step.
>

Agreed, but union merge has now been merged into Mercurial; AFAICT, it'll
be part of the 3.6 release which will be some time around October -- much
earlier than we'll be changing over to use Sqitch, I'd imagine (thinking
we'd start using Sqitch at 1.5.0).


PG Extensions are definitely in the "reuse work of others" category but the
>>> system solves a different problem than sqitch.
>>>
>>
>> This is an interesting difference in views and we should explore it a bit.
>>
>>
>>> Indeed the two together could be a very powerful combination.
>>>
>>> PG Extensions would allow us to ship db components (whether table sets,
>>> etc) in a versioned way.  There is likely to be a learning curve here and
>>> it adds a major build dependency for people installing from our source
>>> packages.  But it provides some really nice features  including versioning
>>> and dependency tracking/enforcement.
>>>
>>> What PG Extensions would allow us to do would be to effectively ship
>>> versioned pieces of our db schema and have an orderly way of upgrading them.
>>>
>>
>> Ok. So far, I'm not sure I see the difference with Sqitch, apart from the
>> fact that Sqitch is something we'd have to depend on and extensions are
>> built into PostgreSQL. One thing where I can see a difference - but we
>> wouldn't be using it, is my current estimation - is managing *external*
>> dependencies. But with Sqitch, we can manage internal dependencies, so, if
>> that's what we're using extensions for, what would be your envisioned
>> difference?
>>
>
> There are a few differences.  First, reloading the db stored procedures
> can be as simple as "drop extension... create extension...." across each of
> the processing modules.
>

Depending on whether we'd design the modules to also support the "ledgersmb
extension" use-case such as Budgetting was for 1.3, extensions should also
contain tables, I think (is this even possible?). Dropping the extension
becomes quite problematic in that case. My proposed use of Sqitch is
entirely directed at the versioning of the core LedgerSMB database schema,
though, so if we restrict the use of extensions to that, I guess you're
right.

However, I think John's point is also very important: if we want to support
cloud-hosted databases (and I think we do?), then how would that
interoperate with extensions? I know how it would interoperate with Sqitch:
seamlessly, because everything will be run over normal database connections.


>   The second thing is we can track internal dependencies at install time
> without pushing another versioning system to users (they need the pg build
> tools installed but they will be from any source build and only a small
> package away for debian users).
>

I'm not sure what you mean here: in my vision users wouldn't need to learn
about it; we'd simply add App::Sqitch as a dependency on LedgerSMB, meaning
it'll be loaded from cpan or from the packaging system. Much like Starman
and - in 1.6 (?) - Dancer.

Then, in setup.pl, we can invoke Sqitch's APIs to run the database upgrades
required. No need for users to see anything about Sqitch.

Secondly, we can make our version tracking useful for external modules.
>

While I do see value in offering version tracking to add-ins and other
extensions, I think we have way too little infrastructure in place for
people to start writing add-ins at all: currently, it's impossible to add a
tab to the contacts screen without patching a truckload of files. And
that's even just an extremely basic use case. If we want to offer an
extension system (which we probably do, in time), we'd need to look long
and hard at Drupal, Joomla and others to copy their best practices. For
now, I'm thinking that providing a stable webservice API is much more
important.


>   For things like stored procedure sections, we could always drop the
>>> extension and load a new version.  But it would make it very clean.  And we
>>> could even break up parts of the db into modules that could be
>>> independently versioned.
>>>
>>
>>> Getting this tooling to work right is not a minor project however.  It
>>> definitely cannot be done in a minor release because of the additional
>>> dependencies, and I think we'd need a policy against adding extensions to
>>> core in major versions.  However it would also provide a nice platform for
>>> third party integration and reuse as well.
>>>
>>> Each of our modules could be an extension and separately versioned.
>>>
>>
>> Ok. I like the technical side of that, but I'm not sure we need that at
>> this point. I *do* see value for this in terms of optional modules. Say the
>> way we used to have Budgetting as an extension in 1.3 (although it solves
>> only 20% of the problem, because hooking up Budgetting into the
>> web-application was actually a bigger problem than getting the tables
>> installed).
>>
>
> Right.  This is in part for the benefit of optional modules and third
> party applications.
>

Ok. But do you mean that it is to support the 20% of the problem, leaving
the 80% unsolved? Combined with John's point about cloud-hosted databases
not having our extensions installed on the server (thus dis-allowing
"CREATE EXTENSION ..." for our extensions), I'm very much leaning toward
using Sqitch over extensions.

For branching and merging, the extensions would have a number specific to
>>> development head and could be re-numbered during the build process.  For
>>> modules all we really need to do is drop extension/create extension if the
>>> version number has changed.
>>>
>>> We'd probably want to reserve a number series for "next in version
>>> series."  I am thinking that using 999 would be a good one, so 1.4.999
>>> would be "whatever next comes in 1.4" and this would allow us to install
>>> and run out of the development tree.
>>>
>>
>> Hmmm. Sqitch - like version control systems - does this by itself (in
>> casu by generating a hash, like Git and other DVCSes).
>>
>
> Right.  I am proposing a difference between what we *ship* and what we
> develop for storage schemas.  I am also saying I don/t think that is needed
> for the stored proc modules.
>

I too am proposing we make a difference between what we ship and what we
develop :-) However, my difference comes from using a development/release
machine to build a clean DB schema on every release for the creation of new
databases and a Sqitch bundle file for upgrading existing databases. Both
do not exist during development and have to be generated upon release.

BTW, what I found out is that Sqitch generates a blockchain of changes
rolled out to the database, not of the changes in the plan file. Using the
blockchain in the database, it validates the correctness of the order of
changes to be applied to the database. So, contrary to my original thought,
changes can be merged between branches without big issues (usually).

> If we're going to go with Sqitch, I don't think it's a good idea to switch
>>> just then and there, but test it to build up some experience and then
>>> choose a well defined point in time to start using it.
>>>
>>> Agreed.  Maybe start on plugins?
>>>
>>
>> Actually, I was thinking of starting with 1.5 *after* it's been upgraded
>> from 1.4? However, that would mean that *everybody* would need to migrate
>> through 1.5.0, because that's the moment we'll move to Sqitch? (Or we could
>> choose 1.6 or ...?)
>>
>
> That places a lot of responsibility on a .0 release.  I would suggest if
> we go that route, we still  maintain migration tools for 1.4 + 1.3 and we
> ship db schemas rather than sqitch plans.  Maybe after we have experience
> with it, we can then switch to sqitch plans in a future migration branch?
>

At first I didn't understand what you meant here at all. I think I do now.
You think I was saying we do not provide upgrade scripts. (Although I don't
see how else we'd get at 1.5.0.)

What I meant was (and is) this process: when upgrading from 1.3 or 1.4, the
upgrade scripts upgrade the install from the respective 1.3/1.4 version to
the database schema that corresponds with 1.5.0. At that point, the schema
gets "Sqitch initialized" and from that point on, all upgrades are going to
be run through the Sqitch bundle of a given release.

Given that all upgrades in Fixes.sql always need to be run anyway, why is
this too big a responsibility for Sqitch or the 1.5.0 release?

> So, there are 2 proposals here. What do you say?
>>>
>>> I don't see how we can get away from doing some things ourselves, so I
>>> guess the question is what and where to reuse other work.
>>>
>>> So my preference would be sqitch and pg extensions.  In my mind this
>>> would require a two-step build process for dbs:
>>>
>>> 1.  Build the schema extensions using sqitch
>>> 2.  Install all pg extensions using make.
>>>
>>
>> Ok. Is this on releases only, or with our day-to-day work on the tree?
>>
>
> For db storage schema changes, on release only.  We can always play sqitch
> on dev databases.
>

This far, we're on the same page for new installs. But what about upgrades?
(e.g. 1.5.0->1.5.2)


> The second though does impose some user dependencies in that pg_config and
>>> other pg development tools must be installed before  the build can be
>>> done.  Of course for those installing debs, these would be done when
>>> building debs rather than when installing them.
>>>
>>
>> If we use sqitch only, we also need it as a dependency, both for
>> developers and for end users. However, we probably don't need to learn our
>> users about it, if we simply use it as a library.
>>
>
> True, but we have to support it.  And we have to depend on it, make sure
> it is installed, etc.
>

True, but there's a *huge* list of items we currently depend on (TeX is
100s of MBs!); I do see a small risk here, but our dependency on TeX for
PDF I find much more thorny. Maybe we should consult Jame for his opinion
about the extra dependency? I mean, he's maintaining packages for
LedgerSMB. (I could see how we'd make sure there are packages available for
Ubuntu/Debian for Sqitch; either through apt.ledgersmb.org or otherwise.)

Also there are a bunch of areas I am not 100% sure about in this regard
> (why I think extensions are cleaner).
> For example we support a bunch of different versions of PostgreSQL.  Some,
> but not all, will throw an error if you do something like:
>
> create or replace function foo (bar int) returns bool language sql as $$
> return true; $$;
> create or replace function foo (baz int) returns bool language sql as $$
> return true; $$;
>

The reason is that more recent versions of Pg tightened up the handling of
> named arguments so now the change in argument name requires a drop function
> before the new one can be created.
>

Right. We are lacking upgrade tests in our test suite, but this is
definitely one of the things to be added to our test suite, because
upgrades have been broken over exactly this point. (I just added
https://github.com/ledgersmb/LedgerSMB/issues/814  .)


> I don't know how sophisticated sqitch is in this regard, and how easy it
> will be to mess things up so a rebuild ends up with dropped functions that
> are never reloaded.
>
> As we have it right now, storage tables are separate from data processing
> routines so with the extension system rebuilding is as simple as:
>
> BEGIN TRANSACTION;
> DROP EXTENSION...
> DROP EXTENSION ....
> ...
> CREATE EXTENSION ....
> CREATE EXTENSION ....
> ....
> COMMIT;
>
> If something goes wrong, the extensions are exactly where they were before
> the failed effort.  In other words, our entire db rebuild/upgrade process
> would be entirely transactional (while right now it is one transaction per
> module), and it would be more predictable, clearer, simpler, and with
> probably less code.
>

Right. Sqitch runs each change in one transaction, so not the entire list
of changes. However, since every change is run in a transaction separately,
the database remains in a well-defined state all the time: the database can
be upgraded from the "current" state to any later state with a well-defined
list of scripts.

-- 
Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.
Robust and Flexible. No vendor lock-in.

------------------------------------------------------------------------------

_______________________________________________
Ledger-smb-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ledger-smb-devel

Re: [Ledger-smb-devel] Managing database change: improvements needed

Reply via email to