Well well, thanks very much for that reference!  I can see how your system to 
enable a workflow to process delta (diff) data (and merge the results back with 
a previous run's output) would greatly lighten the processing power for keeping 
results current.  Interesting choice of technologies too. 

Damion

Message: 5
Date: Fri, 5 Sep 2014 10:41:37 +0000
From: Pedersen Edvard <edvard.peder...@uit.no>
To: "galaxy-dev@lists.bx.psu.edu" <galaxy-dev@lists.bx.psu.edu>
Subject: Re: [galaxy-dev] Concept for a Galaxy Versioned Fasta Data
        Retrieval Tool

My PhD work may be of interest for this subject, although the primary focus has 
been on generating databases comprising the changes from a specific timeframe, 
and was not designed specifically for Galaxy. The similarities between my 
system and the system you are proposing are that it can generate a BLAST 
database from any date (that has been added to the system), as well as "diffs" 
between two dates, and supports FASTA, the Uniprot EMBL variant, full files 
(which does not give compression benefits) and several others. The system uses 
delta compression to make sure that non-updated fields do not take up extra 
space. It uses the Hadoop stack (HBase, HDFS and MapReduce) for parallelism in 
generating the databases (the blast database generation from FASTA files is not 
parallel).

You can find one of the publications here: 
<http://bdps.cs.uit.no/papers/hibb13.pdf> 
http://bdps.cs.uit.no/papers/hibb13.pdf

I hope this can be of some use to you.

Regards,

Edvard Pedersen

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to