Re: [galaxy-dev] Contributing to genome indexes on rsync server

Jennifer Jackson Thu, 07 Mar 2013 11:02:10 -0800

Hi Brad (and Roman),

The team has talked about this in detail. There are a few wrinkles withjust pulling in indexes - Dan is doing some work that could change thislater on, but for now, the rsync will continue to point to the samelocation as Main's genome data source. This means that there are somelimits on what we can do immediately. Setting up a submission pipe isone of them - there just isn't resource to do this right now or a commonplace distinct from Main to house the data. A few other ideas came up -we can chat later, each had side issues.

But I saw your tweet and think that it is great that you are pullingCloudBioLinux data from the rsync now, so let's get as much data incommon as possible, so you have data to work with near term.

I am in the process of adding bt2 indexes - some are published toMain/rsync server already and some are not, but more will show up overthe next week or so (along with more genomes and other indexes). I'lltake a look at what you have and pull/match what I can. Genome sortorder and variants are my concerns, both require special handling inprocessing and .locs. If it takes longer to check, I am just going tocreate here if I haven't already. The GATK-sort hg19 canonical isalready on my list - it needed all indexes, not just bw2. When the nextdistribution goes out, I'll list what is new on the rsync in the News Brief.

For the Novoalign indexes, I'm not quite sure what to do about thoseyet. Or for any indexes associated with tools or genomes not hosted onMain. Do you want to open a card for those and any other cases that aresimilar? We can discuss a strategy from there, maybe at IUC, if Greg/Danthinks it is appropriate. Please add me so I can follow.


I'll be in touch as I go through the data. Thanks for your patience on this!

Jen
Galaxy team

On 2/21/13 12:43 PM, Brad Chapman wrote:

Hi all;
Is there a way for community members to contribute indexes to the rsync
server? This resource is awesome and I'm working on migrating the
CloudBioLinux retrieval scripts to use this instead of the custom S3
buckets we'd set up previously:

https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/galaxy.py

It's great to have this as a public shared resource and I'd like to be
able to contribute back. From an initial pass, here are the things I'd
like to do:

- Include bowtie2 indexes for more genomes.

- Include novoalign indexes for a number of commonly used genomes.

- Clean up hg19 to include a full canonically sorted hg19, with indexes.
   Broad has a nice version prepped so GATK will be happy with it, and
   you need to stick with this ordering if you're ever going to use a
   GATK tool on it. Right now there is a partial hg19canon (without the
   random/haplotype chromosomes) and the structure is a bit complex.

What's the best way to contribute these? Right now I have a lot of the
indexes on S3. For instance, the hg19 indexes are here:

https://s3.amazonaws.com/biodata/genomes/hg19-bowtie.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-bowtie2.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-bwa.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-novoalign.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-seq.tar.xz
https://s3.amazonaws.com/biodata/genomes/hg19-ucsc.tar.xz

I'm happy to format these differently or upload somewhere that would
make it easy to include. Thanks again for setting this up, I'm looking
forward to working off a shared repository of data,
Brad
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/


--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

Re: [galaxy-dev] Contributing to genome indexes on rsync server

Reply via email to