Hello Brad, Jennifer,
I'm also interested in this initiative since I'm using Brad's code as
a part of a testsuite for a bloom filter:
As both of you pointed out, there's a need for some cleanup, there are
some stray files here and there:
-rw-r--r-- 1 vagrant vagrant 34M May 28 2009 .2bit <---- ???
-rw-r--r-- 1 vagrant vagrant 779M Aug 13 2010 hg19.2bit
In general I think that some good naming conventions in such a good
community resource would be *very* useful.
I will be glad to help! :)
@romanvg on Trello.
On Sat, Feb 23, 2013 at 8:39 PM, Brad Chapman <chapm...@50mail.com> wrote:
> That sounds great, thanks for your enthusiasm and help organizing this.
> I'm @bradchapman on Trello so feel free to add me to the ticket and let
> me know how I can help. I'm happy to set this up however you feel best:
> looking forward to having a shared repository for all this formatted
> genome data. Thanks again,
>> Hi Brad,
>> I really like this idea. I'm not going to open a ticket yet but talk
>> with Dan/team about some options. We have an alternate directory
>> structure modeled from last fall I'd like to get in place before we
>> start something like this ( is not yet implemented, but it or something
>> similar would be required to properly add in the hg19 GTAK-sort. will
>> involve a bit of other data shuffling w/ .loc changes to keep external
>> links functional ). There are also some other repo ideas in play.
>> Let me get some internal feedback next week, then I'll start a Trello
>> ticket with some basics from our side that we can use to vet a plan to
>> go forward, assuming the rest of the team likes idea. I think test
>> cases/index validation would probably be part of this somehow. And
>> certainly some simplification we be welcomed in the more cluttered dirs,
>> if that can be managed while keeping enough around for reproducibility
>> Very topical, thanks for bring up and offering to help out! These can be
>> great deal of work to create and makes total sense to share. I'll send
>> and update later next week, hopefully with a Trello link so we can get
>> On 2/21/13 12:43 PM, Brad Chapman wrote:
>>> Hi all;
>>> Is there a way for community members to contribute indexes to the rsync
>>> server? This resource is awesome and I'm working on migrating the
>>> CloudBioLinux retrieval scripts to use this instead of the custom S3
>>> buckets we'd set up previously:
>>> It's great to have this as a public shared resource and I'd like to be
>>> able to contribute back. From an initial pass, here are the things I'd
>>> like to do:
>>> - Include bowtie2 indexes for more genomes.
>>> - Include novoalign indexes for a number of commonly used genomes.
>>> - Clean up hg19 to include a full canonically sorted hg19, with indexes.
>>> Broad has a nice version prepped so GATK will be happy with it, and
>>> you need to stick with this ordering if you're ever going to use a
>>> GATK tool on it. Right now there is a partial hg19canon (without the
>>> random/haplotype chromosomes) and the structure is a bit complex.
>>> What's the best way to contribute these? Right now I have a lot of the
>>> indexes on S3. For instance, the hg19 indexes are here:
>>> I'm happy to format these differently or upload somewhere that would
>>> make it easy to include. Thanks again for setting this up, I'm looking
>>> forward to working off a shared repository of data,
>>> Please keep all replies on the list by using "reply all"
>>> in your mail client. To manage your subscriptions to this
>>> and other Galaxy lists, please use the interface at:
>> Jennifer Hillman-Jackson
>> Galaxy Support and Training
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at: