-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
> What has occurred to me after reading your suggestions, is the fact that
> I have captured the SQL generated by MartBuilder as a text file (as well
> as invoking MartRunner), when I created our Mart. Therefore it would
> seem to me an easy option to take this SQL, amend it to refresh, not
> create, all the components in my Mart, by running this amended SQL as a
> Cron job, for example. Does this seem like a good course of action to
> you? Are there any pitfalls that I need to avoid? (Other than the
> usual pitfalls of editing a large set of SQL commands!)
The pitfall is simply working out how to write the SQL!
Marts are denormalised forms of your normalised data, and so they
contain much duplication and also don't contain all the original
information - they only take as much as they need. Importantly, they
don't always contain all the primary+foreign keys involved in the
generation of each record (e.g. if you haven't included in your final
mart table all the primary+foreign key columns from every component
table that was merged into that table).
This leads to some interesting questions: Without all the key
information in place, how would the SQL know which rows in the existing
table relate to which rows in the modified database? How would it know
which rows in the database have changed, and in what way? If you've used
restrictions to filter the rows selected in your database, or expression
columns to compute values based on those rows, how would it know which
of these were affected? If a column was used in a filter but also masked
and therefore not present in the final table, how would the SQL know
which rows that filter affected?
These questions are pretty hard to answer manually for a specific case
except in the most simple situations, and extremely hard to answer in a
general case that would suit all possible marts. It's essentially the
same as asking for a database to be reconstructed from a mart that was
generated from it - this is impossible in most cases because marts
usually don't have all the original data and therefore couldn't
reconstruct the primary and foreign keys, which are essential.
> (Also I am aware that the presentational components of any Mart, as
> maintained via MartEditor, are stored in the Mart itself, so I would
> have to be careful that I didn't remove these in my update SQL script.
> I don't know the internal structure of Marts, but can I assume that the
> SQL generated by MartBuilder doesn't affect any components that
> MartEditor creates/updates?)
The presentation config is all stored in tables inside each mart. The
table names begin with the word 'meta'. If you back these tables up and
copy them to the new mart, this will suffice as long as all mart and
dataset names remain the same. MartBuilder's SQL does not touch these
tables in any case.
However, if you need to update your presentation config (e.g. if you
have a drop-down list that lists all chromosome names and your updated
data includes a new chromosome name), then after copying the config
across you'll have to update the config in MartEditor for it to pick up
the new values.
> Suggestion for future development: it would be great if MartEditor had a
> command line alternative, so I could store my presentational XML
> somewhere and apply it via a shell script for example. Also, it would
> be great if the SQL generated by MartBuilder could be flexible enough to
> be re-runnable, thus solving my problem ("Drop Table If Exists....." in
> MySQL, for example).
The configs are stored as XML documents (both compressed (gzip) and
uncompressed, with md5 checksums). If the changes you are making are the
kind that can be solved using regexes, then you could write a script
that extracts the relevant XML config for a dataset, applies the regex
to it, then puts it back into the database using an update statement.
What tasks would a command-line based MEditor do? I'm intrigued - we're
currently rethinking MartEditor anyway, so this is an interesting
feature we might like to include.
Re-runnable SQL would be nice, but only MySQL has SQL syntax that
supports such SQL (create or replace, drop if exists, etc.).
MartBuilder, being cross-platform, cannot use concepts that apply to
only one platform. The only way round this at present is to run your SQL
via MartRunner, which is clever enough to work out if tables are going
to conflict, at which point it will drop them first. This feature will
be available in the next release of MartBuilder/Runner - not yet out,
but Arek probably has a date in mind.
cheers,
Richard
- --
Richard Holland (BioMart)
EMBL EBI, Wellcome Trust Genome Campus,
Hinxton, Cambridgeshire CB10 1SD, UK
Tel. +44 (0)1223 494416
http://www.biomart.org/
http://www.biojava.org/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHWUWi4C5LeMEKA/QRAkzUAJsF8yF9wcbvsxSL3stBxRnKbbOrDQCcC+6i
uFNLaMMdV9CM4khUjTAanyg=
=YQzH
-----END PGP SIGNATURE-----