Hello Catheryn,
Yes, all of this can be done. Once you have an annotation source
identified (or sources!), the rest is part of the core functionality of
Galaxy.
One of the outputs from MACS is a bed file with the peaks. BED format is
similar to interval format and can be used with the tools in the group
"Operate on Genomic Intervals". Or if as BED, with tools in the group
"BEDTools" (such as 'Intersect multiple sorted BED files'). If you need
help understanding these datatypes, this wiki explains - see the last
bullet for links:
http://wiki.galaxyproject.org/Support#Dataset_special_cases
The idea is to obtain annotation data also in BED/interval format, then
perform the comparisons. Where there is overlap (or no overlap, in the
case of intergenic), the annotation can be assigned. I am not sure what
genome you are working with, but if it is available from UCSC or another
common public site, this can be fairly straightforward (but this is very
important - the same, exact base reference genome that you mapped
against must be the one you extract annotation from - the name in Galaxy
will be the same exact name as the source in nearly all cases - please
ask if you have a question about this).
At UCSC, the Table browser contains all the annotation tracks found in
the Browser itself, and you will most likely want to use those from the
"Gene and Gene Prediction" group, although there are likely others in
the ENCODE group that are also of interest. The description for each
track is at UCSC, including methods, often very detailed. When
extracting the data (using the tool "Get Data -> UCSC Main table
browser"), options to subset the BED output regions by exons or introns
or predicted promoter regions, etc. are available.
Biomart can be another great source of annotation, especially for
genomes in Ensembl annotation builds. The tool would be "Get Data ->
BioMart Central server". The same basic extraction concepts would apply
although the form is organized differently. The help there will guide
you. The important parts are the chromosome, start, and end. The best
tip I can offer when working with Biomart data is to avoid HTML content
- this is often found in the longer descriptions. If you get an import
error about HTML content, this isn't a huge problem. Just try again,
eliminating suspected fields - the field/s with the HTML can usually be
identified quickly with a few test imports.
There are other sources in this "Get Data" tool group and many other
external annotation projects that have data (from these you can simply
download/upload or directly load via a URL). You can start with a larger
file with all of the details, compare with just coordinates, then go
back and pick up the details with a final join. Some examples of how to
do these types of operations are in our ChIP-seq example and in our
paper from last year, links here:
https://usegalaxy.org/u/james/p/exercise-chip-seq
https://usegalaxy.org/u/galaxyproject/p/using-galaxy-2012
Please note that the public Main server at usegalaxy.org will be
unavailable during US East coast business hours tomorrow as stated on
the current banner:
"TACC will be performing storage system updates on Tuesday, December 3
from 9 AM to 6 PM EST (UTC -0500). During this time, Galaxy will be
unavailable."
Hopefully this helps!
Jen
Galaxy team
On 12/2/13 6:35 PM, Wooi Lim wrote:
Dear Galaxy,
I am analysing ChIP-Seq data from Illumina using Galaxy web server. I
mapped the reads with bowtie and did the peak calling with Macs.
The next thing I wanted to do is to annotate the peaks with genomic
regions i.e. promoter, intergenic, intron etc and gene names.
I am not sure if these can be achieved through Galaxy and if so, how
can this be done? Thank you.
Catheryn
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/
--
Jennifer Hillman-Jackson
http://galaxyproject.org
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/