On Fri, Sep 11, 2009 at 9:51 AM, Patrick Aboyoun <[email protected]
<mailto:[email protected]>> wrote:
Steve,
I have revised your code to simplify it a bit. I added example
data as well to make it easier for others to run it. The shift and
width arguments to coverage are discussed in the coverage man
page. If you are unsure what the sift argument will do to a set of
ranges, you can run the shift operation on them (e.g.
shift(bounds, shift = - start(bounds) + 1)).
==== Revised code ====
suppressMessages(library(IRanges))
sample.reads <- IRanges(start = c(11869, 11882, 12245, 12554,
12555, 12557), width = 32)
bounds <- IRanges(start=11700, end=12550)
# Get the the reads from my sample.reads
# object that lie within my gene bounds
gene.reads <- sample.reads[bounds]
# Build my coverage Rle vector
cov <- coverage(gene.reads, shift = - start(bounds) + 1, width =
width(bounds))
# Create the non-Rle coverage vector that I will use down stream
# to make a GenomeGraph::BaseTrack object
counts <- as.numeric(cov)
Would as.integer() be sufficient here?
# Build the base-position vector and filter out positions
# with 0 counts from both vectors
keep <- counts != 0
# This is (i) from above
bases <- as.integer(bounds)[keep]
# This is (ii) from above
counts <- counts[keep]
=====================
> sessionInfo()
R version 2.10.0 Under development (unstable) (2009-09-07 r49613)
i386-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods
base
other attached packages:
[1] IRanges_1.3.72
loaded via a namespace (and not attached):
[1] tools_2.10.0
Steve Lianoglou wrote:
Hi Michael,
Thanks for your suggestions so far ... being able to slice a Range
object w/ another Range object somehow fell off of my radar.
-steve
On Thu, Sep 10, 2009 at 6:25 PM, Michael
Lawrence<[email protected] <mailto:[email protected]>> wrote:
On Thu, Sep 10, 2009 at 1:55 PM, Steve Lianoglou
<[email protected]
<mailto:[email protected]>> wrote:
Hi all,
I'm finding it hard to shed some of my 10 thumbs when
using the
IRanges/Rle classes, so can someone suggest the
"correct" way to do what my
code does below?
My goal is to use GenomeGraphs to plot a picture of my
gene model, along
with the reads from 1 or more samples above it. I've
actually got that
working, but I feel there's a more idiomatic way to do
this, so if you have
any comments, I'd be interested in hearing them.
I'll paste the function in its entirety at the end of
the email. It works
as advertised, but perhaps not done in the most
efficient way. For brevity's
sake, however, I've distilled the specific code for
your feedback.
Anyway, here's the gist:
1. sample.reads : An IRanges object of the reads
aligned to a particular
chromosome from a particular rna-seq sample:
R> > head(sample.reads)
IRanges instance:
start end width
[1] 11869 11900 32
[2] 11882 11913 32
[3] 12245 12276 32
[4] 12554 12585 32
[5] 12555 12586 32
[6] 12557 12588 32
2. bounds: An IRange object that indicates the start
and end position of
my gene -- I want all the reads that land here:
R> bounds
IRanges instance:
start end width
[1] 12040238 12073571 33334
Goal:
I wan to build two vectors:
(i) A vector of positions on my chromosome (having
length == to the length
of my gene)
(ii) A vector of counts over those positions
Positions with 0 counts are removed from (i) and (ii)
==== Here's my code ====
# Get the index of the reads from my sample.reads
# object that lie within my gene bounds
which.reads <- subjectHits(overlap(sample.reads, bounds))
# Slice out these reads
gene.reads <- reads[which.reads]
The above two steps could be simply:
gene.reads <- sample.reads[bounds]
# Build my coverage Rle vector
cov <- coverage(gene.reads)
# Only keep the part of the Rle that starts @ the start
# of my gene to the end (I feel like doing this is
weird (?))
cov <- cov[start(bounds):length(cov)]
# Create the non-Rle coverage vector that I will use
down stream
# to make a GenomeGraph::BaseTrack object
counts <- numeric(width(bounds))
# Populate this vector with the correct coverage counts
counts[1:length(cov)] <- as.vector(cov)
As you say, this looks like it could be done easier using
arguments to
coverage(). Maybe someone else could explain this?
# Build the base-position vector and filter out positions
# with 0 counts from both vectors
keep <- counts != 0
# This is (i) from above
bases <- (start(bounds):end(bounds))[keep]
A bit shorter:
bases <- as.integer(bounds)[keep]
# This is (ii) from above
counts <- counts[keep]
=====================
I feel like the coverage interface, using the
start/end args would help
here, no? Wouldn't this have worked to make things a
bit easier?
cov <- coverage(gene.reads, start=start(bounds),
end=end(bounds))
How would you do that by using the width/shift args?
Or wouldn't you?
Maybe I'm not clear on what the original start/end
args in ``coverage`` are
meant to do?
Anyway, I was just curious if there was a better way
to do what I need to
do.
Thanks,
-steve
===== Here's the function in all of its glory, use it
if you like =====
Here's a sample call:
plotGenomeGraphReads(list(Sample1=reads.1,
Sample2=reads.2),
gene.id <http://gene.id>='ENSG00000116688',
title="MFN2")
Here's the function:
plotGenomeGraphReads <- function(sample.reads,
gene.model=NULL,
gene.id <http://gene.id>=NULL,
biomart=NULL,
cols=NULL, title=NULL) {
# Plots the read counts over a particular gene model.
#
# You can pass your own gene.model as an IRange
object, or use biomaRt to
# retrieve the model using the Ensembl Gene ID
gene.id <http://gene.id>
#
# Parameters
# ----------
# sample.reads : named list[IRanges] representing
reads from separate
samples
# over a given chromosome (the one
your gene is on!)
# gene.model : IRanges object of start/end
positions of the gene
# mart : biomaRt, if you want to query
ensembl for the gene model
# cols : the colors you want to use for the
reads in the
respective
# samples
if (is.null(gene.model) && is.null(gene.id
<http://gene.id>)) {
stop("Need to get the gene.model from somewhere.")
}
# Figure out what type of gene.model we're using and
get appropriate
bounds
if (!is.null(gene.model)) {
gm <- makeGeneModel(start=start(gene.model),
end=end(gene.model))
bounds <- range(gene.model)
} else {
if (is.null(biomart)) {
biomart <- useMart('ensembl',
dataset='hsapiens_gene_ensembl')
}
gm <- makeGene(id=gene.id <http://gene.id>,
type='ensembl_gene_id', biomart=biomart)
if (is.null(title)) {
title <- gene.id <http://gene.id>
}
anno <- g...@ens
bounds <- range(IRanges(start=anno$exon_chrom_start,
end=anno$exon_chrom_end))
}
sample.names <- names(sample.reads)
if (is.null(cols)) {
cols <- rainbow(length(sample.reads))
names(cols) <- sample.names
}
if (is.null(title)) {
title <- 'Gene Reads'
}
stopifnot(length(cols) == length(sample.reads))
stopifnot(names(cols) == sample.names)
# Calculate the coverage vectors over the gene model
for each sample
sample.coverage <- lapply(sample.names, function(name) {
reads <- sample.reads[[name]]
which.reads <- subjectHits(overlap(reads, bounds))
gene.reads <- reads[which.reads]
cov <- coverage(gene.reads)
cov <- cov[start(bounds):length(cov)]
counts <- numeric(width(bounds))
counts[1:length(cov)] <- as.vector(cov)
keep <- counts != 0
bases <- (start(bounds):end(bounds))[keep]
counts <- counts[keep]
list(pos=bases, coverage=counts)
})
names(sample.coverage) <- sample.names
# Make y-axis the same for each sample
max.counts <- max(sapply(sample.coverage, function(cov)
max(cov$coverage)))
tracks <- lapply(sample.names, function(name) {
dp <- DisplayPars(lwd=0.3, color=cols[[name]],
ylim=c(0, max.counts))
makeBaseTrack(base=sample.coverage[[name]]$pos,
value=sample.coverage[[name]]$coverage,
dp=dp)
})
names(tracks) <- names(sample.reads)
title <- makeTitle(text=title)
plot.me <- c(title, tracks, list(gm))
gdPlot(plot.me, minBase=start(bounds),
maxBase=end(bounds))
}
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
<http://cbio.mskcc.org/%7Elianos/contact>
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
<mailto:[email protected]>
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing