Hello Shawn,
To do this in three steps:
1 - Format your existing file, set type as interval,
and assign columns ("Edit attributes").
Start by changing this:
chr10:11,997,707-12,330,274
To become like this, separated by tabs:
chr10 11997707 12330274
Add in strand if possible:
chr10 11997707 12330274 +
2 - Obtain a mapped transcript file that includes gene identifiers
a) Once choice is UCSC's "Known Genes" track:
From your working history, use tool "Get Data -> UCSC main"
Select the genome (hg18) and the track "UCSC Genes", with
output = selected fields from primary and related tools and
merge in identifiers from tables such as "hg18.kgXref". The
track "RefSeq Genes" is another option (RefSeq accession is
"name" and gene identifier is "name2". Send query to Galaxy,
set type as interval, and assign columns.
b) Another choice would normally be Ensembl Genes from
"Get Data -> Biomart", but only hg19 is available.
3 - Merge the files based on overlap
The tool you will most likely want to use is "Operate on
Genomic Intervals -> Join", although you may want to
explore others.
Help:
http://wiki.g2.bx.psu.edu/Learn/Interval%20Operations
also see screencasts at http://usegalaxy.org
quickies #3 & #5 to start with
Hopefully this helps to get you started!
Thanks,
jen
On 7/21/11 4:37 PM, Shawn Anderson wrote:
Hello,
I'm not sure if this is the place to ask this, but if so - here goes. If
I have a list of genomic regions (from CNV gains and losses) comprised
of chromosome, start and stop (ie. chr7 68000000 71000000) for a given
genome build (HG 18), and I want to add the genes (ideally HUGO gene
Symbols or refseqIDs)that reside within each region per line.
So I want to input something like this:
Sample
Chromosome Region
Event
Length
JC 507 CD19
chr10:11,997,707-12,330,274
CN Gain
332568
JC 507 CD19
chr10:47,563,503-48,085,608
CN Loss
522106
JC 507 CD19
chr10:69,510,584-69,951,738
CN Gain
441155
And get an output similar to this:
Sample
Chromosome Region
Event
Length
Gene Symbols
JC 507 CD19
chr10:11,997,707-12,330,274
CN Gain
332568
CDC123, DHTKD1, NUDT5, SEC61A2, UPF2
JC 507 CD19
chr10:47,563,503-48,085,608
CN Loss
522106
AGAP9, ANXA8, ANXA8L1, CTSL1P2, FAM25B, FAM25C, FAM25G, GDF10, GDF2,
LOC642826, RBP3, ZNF488
JC 507 CD19
chr10:69,510,584-69,951,738
CN Gain
441155
ATOH7, DNA2, HNRNPH3, MYPN, PBLD, RUFY2, SLC25A16
Possible ?
*Shawn Anderson*
Application Scientist -*Laboratory****for Advanced Genome Analysis*
Vancouver Prostate Centre - Vancouver General Hospital
2660 Oak Street
Vancouver BC V6H 3Z6
P:604-875-4111 ext. 63436
F:604-875-5654
[email protected] <mailto:[email protected]>_
www.LAGAPC.ca <http://www.microarray.prostatecentre.com/>
__________ Information from ESET NOD32 Antivirus, version of virus
signature database 6314 (20110721) __________
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
--
Jennifer Jackson
http://usegalaxy.org/
http://galaxyproject.org/
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/