Re: Data patches during upgrade

Dewayne Richardson Tue, 13 Jun 2017 06:09:51 -0700

The reason the migrations manage "structure" is because of rollbacks with
"goose Up and Down" (not saying "data" can't be rolled back, but it gets
trickier, what if I screw up the "where" clause and inadvertently remove
other data outside of the scope of the migration unintentionally?).

Migrations managing only structure also helps with "separation".  I can
apply "test" or "integration" data to the same structure, without having to
run the migrations.  Test data doesn't always align with seed data because
I might need to artificially manipulate the data differently depending on
the test scenario I want to achieve.  If migrations have placed data in the
database when I need to test, now I have to remove that static data before
I even start.  Yes, we do that now with the test data but the test data is
preditable.

We have this problem today with the migrations (just managing the
structure).  If I'm working in a branch and commit a migration with a
migration file timestamp of yesterday, then you are also working on a
different migration with a migration timestamp of today and commit and push
before I do, then the timestamp sequencing gets misaligned.  Once goose
runs it "bookmarks" your migration in the goose_db_version table in front
of mine, which causes goose to ignore my migration, which forces me to
rename that file to a later timestamp just so goose will see it.

So, if we "interleave" seed data (over time), managing that data order on
top of the structure management in goose will add yet another level of
complexity.  This is the reason the db/admin.pl was created to help with
this workflow.  I know it's not optimal, but these are the hurdles we hit
as we were figuring this stuff out.

-Dewayne

On Mon, Jun 12, 2017 at 10:14 PM, Naama Shoresh <[email protected]> wrote:

> Hi,
>
> I want to suggest a slightly different approach.
> Goose is the brain managing the DB upgrade, right?
> The data patches are part of the DB evolution, but today we can't use goose
> to run them because we have seeds.sql in the middle.
> What I suggest is turning seeds.sql into another migration script,
> resulting in the following procedure:
> 0) (In clean installations) Tables creation
> 1) Goose migrations:
> 1a) Schema changes
> 1b) Data seeding (seeds.sql)
> 1c) Data changes
>
> Going forward, I believe a data change migration script will be attached to
> most schema changes, instead of populating the DB in the seeds.sql.
>
> The benefits come from the fact that for future changes, the order
> presented above (1a, 1b, 1c) is not strict.
> Future schema/data changes are expressed in a single migration script,
> containing all relevant operations.
> This ensures that whatever change is needed (schema/data/both), and
> whatever the change depends on, it can be handled by a single Goose
> migration script.
>
> What do you think?
>
>
>
> On Fri, Jun 9, 2017 at 6:35 PM, Dewayne Richardson <[email protected]>
> wrote:
>
> > Yea, it's just a new feature to admin.pl to support data conversions, to
> > keep the migrations clean.  Derek and I have been working through it.
> >
> > -Dew
> >
> > On Thu, Jun 8, 2017 at 7:40 AM, Jeremy Mitchell <[email protected]>
> > wrote:
> >
> > > This seems to make sense to me but honestly, I'd probably defer to
> > Dewayne.
> > >
> > > In theory, it would be nice if migrations only included "structural"
> > > changes (new tables, columns, changing column types or not  null, etc)
> > and
> > > seeds focused on the "base" (or the minimum required) static data
> > required
> > > of TO (types, statuses, roles, etc) and then yea, putting data fixing
> or
> > > data massaging as the last step makes sense to me. But you know what
> they
> > > say about theory...
> > >
> > > +1
> > >
> > > Jeremy
> > >
> > > On Wed, Jun 7, 2017 at 8:41 AM, Gelinas, Derek <
> > [email protected]>
> > > wrote:
> > >
> > > > I'm adding a feature to traffic ops that creates a new column in
> > > > steering_target called type, that is populated with type ids from the
> > > type
> > > > table.  Using admin.pl upgrade, the column is created in migrations,
> > and
> > > > the two types for this table are populated by seeds.sql.  None of
> this
> > is
> > > > out of the ordinary.  Unfortunately I also need to populate the type
> > > column
> > > > based on data that isn't in there until after seeds.sql is run, so I
> > > can't
> > > > place this into the migration.  Seeds.sql needs to run after the
> > > migration
> > > > due to any structural changes that happen there.
> > > >
> > > > Dewayne and I have discussed this a bit this morning, and we're
> > thinking
> > > > the best solution might be a third step, run after seeds.sql, called
> > > > patches.sql.  This would be specifically for data fixes like in this
> > use
> > > > case.  The order would be as follows:
> > > >
> > > > migration - structure
> > > > seeds - static data
> > > > patches - data fixes
> > > >
> > > > Thoughts?
> > > >
> > > > Derek
> > >
> >
>
>
>
> --
> *Naama Shoresh*
> Qwilt | Work: +972-72-2221706 | Mobile: +972-52-3401999 |
> [email protected]
>

Re: Data patches during upgrade

Reply via email to