Re: [galaxy-dev] Contributing to genome indexes on rsync server

Roman Valls Tue, 05 Mar 2013 04:39:24 -0800

Hello Brad, Jennifer,

I'm also interested in this initiative since I'm using Brad's code as
a part of a testsuite for a bloom filter:


https://github.com/SciLifeLab/facs/blob/master/facs/utils/galaxy.py

As both of you pointed out, there's a need for some cleanup, there are
some stray files here and there:

-rw-r--r-- 1 vagrant vagrant  34M May 28  2009 .2bit <---- ???
-rw-r--r-- 1 vagrant vagrant 779M Aug 13  2010 hg19.2bit

In general I think that some good naming conventions in such a good
community resource would be *very* useful.

I will be glad to help! :)

@romanvg on Trello.

Cheers!
Roman

On Sat, Feb 23, 2013 at 8:39 PM, Brad Chapman <[email protected]> wrote:
>
> Jen;
> That sounds great, thanks for your enthusiasm and help organizing this.
> I'm @bradchapman on Trello so feel free to add me to the ticket and let
> me know how I can help. I'm happy to set this up however you feel best:
> looking forward to having a shared repository for all this formatted
> genome data. Thanks again,
> Brad
>
>
>> Hi Brad,
>>
>> I really like this idea. I'm not going to open a ticket yet but talk
>> with Dan/team about some options. We have an alternate directory
>> structure modeled from last fall I'd like to get in place before we
>> start something like this ( is not yet implemented, but it or something
>> similar would be required to properly add in the hg19 GTAK-sort. will
>> involve a bit of other data shuffling w/ .loc changes to keep external
>> links functional ). There are also some other repo ideas in play.
>>
>> Let me get some internal feedback next week, then I'll start a Trello
>> ticket with some basics from our side that we can use to vet a plan to
>> go forward, assuming the rest of the team likes idea. I think test
>> cases/index validation would probably be part of this somehow. And
>> certainly some simplification we be welcomed in the more cluttered dirs,
>> if that can be managed while keeping enough around for reproducibility
>> needs.
>>
>> Very topical, thanks for bring up and offering to help out! These can be
>> great deal of work to create and makes total sense to share. I'll send
>> and update later next week, hopefully with a Trello link so we can get
>> started.
>>
>> Jen
>> Galaxy
>>
>> On 2/21/13 12:43 PM, Brad Chapman wrote:
>>>
>>> Hi all;
>>> Is there a way for community members to contribute indexes to the rsync
>>> server? This resource is awesome and I'm working on migrating the
>>> CloudBioLinux retrieval scripts to use this instead of the custom S3
>>> buckets we'd set up previously:
>>>
>>> https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/galaxy.py
>>>
>>> It's great to have this as a public shared resource and I'd like to be
>>> able to contribute back. From an initial pass, here are the things I'd
>>> like to do:
>>>
>>> - Include bowtie2 indexes for more genomes.
>>>
>>> - Include novoalign indexes for a number of commonly used genomes.
>>>
>>> - Clean up hg19 to include a full canonically sorted hg19, with indexes.
>>>    Broad has a nice version prepped so GATK will be happy with it, and
>>>    you need to stick with this ordering if you're ever going to use a
>>>    GATK tool on it. Right now there is a partial hg19canon (without the
>>>    random/haplotype chromosomes) and the structure is a bit complex.
>>>
>>> What's the best way to contribute these? Right now I have a lot of the
>>> indexes on S3. For instance, the hg19 indexes are here:
>>>
>>> https://s3.amazonaws.com/biodata/genomes/hg19-bowtie.tar.xz
>>> https://s3.amazonaws.com/biodata/genomes/hg19-bowtie2.tar.xz
>>> https://s3.amazonaws.com/biodata/genomes/hg19-bwa.tar.xz
>>> https://s3.amazonaws.com/biodata/genomes/hg19-novoalign.tar.xz
>>> https://s3.amazonaws.com/biodata/genomes/hg19-seq.tar.xz
>>> https://s3.amazonaws.com/biodata/genomes/hg19-ucsc.tar.xz
>>>
>>> I'm happy to format these differently or upload somewhere that would
>>> make it easy to include. Thanks again for setting this up, I'm looking
>>> forward to working off a shared repository of data,
>>> Brad
>>> ___________________________________________________________
>>> Please keep all replies on the list by using "reply all"
>>> in your mail client.  To manage your subscriptions to this
>>> and other Galaxy lists, please use the interface at:
>>>
>>>    http://lists.bx.psu.edu/
>>>
>>
>> --
>> Jennifer Hillman-Jackson
>> Galaxy Support and Training
>> http://galaxyproject.org
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>   http://lists.bx.psu.edu/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Contributing to genome indexes on rsync server

Reply via email to