My PhD work may be of interest for this subject, although the primary focus has 
been on generating databases comprising the changes from a specific timeframe, 
and was not designed specifically for Galaxy. The similarities between my 
system and the system you are proposing are that it can generate a BLAST 
database from any date (that has been added to the system), as well as "diffs" 
between two dates, and supports FASTA, the Uniprot EMBL variant, full files 
(which does not give compression benefits) and several others. The system uses 
delta compression to make sure that non-updated fields do not take up extra 
space. It uses the Hadoop stack (HBase, HDFS and MapReduce) for parallelism in 
generating the databases (the blast database generation from FASTA files is not 

You can find one of the publications here: 

I hope this can be of some use to you.


Edvard Pedersen
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

To search Galaxy mailing lists use the unified search at:

Reply via email to