My PhD work may be of interest for this subject, although the primary focus has
been on generating databases comprising the changes from a specific timeframe,
and was not designed specifically for Galaxy. The similarities between my
system and the system you are proposing are that it can generate a BLAST
database from any date (that has been added to the system), as well as "diffs"
between two dates, and supports FASTA, the Uniprot EMBL variant, full files
(which does not give compression benefits) and several others. The system uses
delta compression to make sure that non-updated fields do not take up extra
space. It uses the Hadoop stack (HBase, HDFS and MapReduce) for parallelism in
generating the databases (the blast database generation from FASTA files is not
You can find one of the publications here:
I hope this can be of some use to you.
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at: