Re: [Bioc-sig-seq] IRanges::overlap interface, help writing idiomatic code?

Steve Lianoglou Fri, 11 Sep 2009 11:46:23 -0700

Hi,

On Sep 11, 2009, at 2:18 PM, Patrick Aboyoun wrote:

Steve,
Good point. I added a clarifying statement in coverage's siftargument description:
\item{shift}{
  For most methods, an integer vector (recycled to the length of
  \code{x}) specifying how each element in \code{x} should be
  (horizontally) shifted before the coverage is computed. Only shifted
indices in the range [1, \code{width}] will be included in thecoverage
  calculation.
  ...
}


Yes, that's great.

Thanks again,
-steve

Patrick


Steve Lianoglou wrote:
Hi Patrick,

Your suggestions were what I was looking for, they're great, thanks!

On Sep 11, 2009, at 12:51 PM, Patrick Aboyoun wrote:
Steve,
I have revised your code to simplify it a bit. I added exampledata as well to make it easier for others to run it.
Yeah, I should have pasted the objects in an easy-to-paste-into-R-format, sorry for the oversight.
The shift and width arguments to coverage are discussed in thecoverage man page.
I was wrestling with them trying to figure out the correct way Iwas expected to use them ... it just wasn't all that intuitive, tobe honest.
I should have noticed in the examples section that the coverageisn't calculated for negative ranges (which is what you get whenyou shift all of your ranges to start at start(bounds)) ... thoughit would be clear if I paid more attention to the very firstexample of ``coverage`` usage.
Perhaps you might consider making an explicit comment in thedocumentation for the width parameter regarding this fact?
Thanks again,
-steve
If you are unsure what the sift argument will do to a set ofranges, you can run the shift operation on them (e.g. shift(bounds, shift = - start(bounds) + 1)).
==== Revised code ====

suppressMessages(library(IRanges))
sample.reads <- IRanges(start = c(11869, 11882, 12245, 12554,12555, 12557), width = 32)
bounds <- IRanges(start=11700, end=12550)

# Get the the reads from my sample.reads
# object that lie within my gene bounds
gene.reads <- sample.reads[bounds]

# Build my coverage Rle vector
cov <- coverage(gene.reads, shift = - start(bounds) + 1, width =width(bounds))
# Create the non-Rle coverage vector that I will use down stream
# to make a GenomeGraph::BaseTrack object
counts <- as.numeric(cov)

# Build the base-position vector and filter out positions
# with 0 counts from both vectors
keep <- counts != 0

# This is (i) from above
bases <- as.integer(bounds)[keep]

# This is (ii) from above
counts <- counts[keep]

=====================


> sessionInfo()
R version 2.10.0 Under development (unstable) (2009-09-07 r49613)
i386-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
other attached packages:
[1] IRanges_1.3.72

loaded via a namespace (and not attached):
[1] tools_2.10.0




Steve Lianoglou wrote:
Hi Michael,

Thanks for your suggestions so far ... being able to slice a Range
object w/ another Range object somehow fell off of my radar.

-steve
On Thu, Sep 10, 2009 at 6:25 PM, Michael Lawrence<[email protected]> wrote:
On Thu, Sep 10, 2009 at 1:55 PM, Steve Lianoglou
<[email protected]> wrote:
Hi all,

I'm finding it hard to shed some of my 10 thumbs when using the
IRanges/Rle classes, so can someone suggest the "correct" wayto do what my
code does below?
My goal is to use GenomeGraphs to plot a picture of my genemodel, alongwith the reads from 1 or more samples above it. I've actuallygot thatworking, but I feel there's a more idiomatic way to do this, soif you have
any comments, I'd be interested in hearing them.
I'll paste the function in its entirety at the end of theemail. It worksas advertised, but perhaps not done in the most efficient way.For brevity'ssake, however, I've distilled the specific code for yourfeedback.
Anyway, here's the gist:
1. sample.reads : An IRanges object of the reads aligned to aparticular
chromosome from a particular rna-seq sample:

R> > head(sample.reads)
IRanges instance:
 start   end width
[1] 11869 11900    32
[2] 11882 11913    32
[3] 12245 12276    32
[4] 12554 12585    32
[5] 12555 12586    32
[6] 12557 12588    32
2. bounds: An IRange object that indicates the start and endposition of
my gene -- I want all the reads that land here:

R> bounds
IRanges instance:
    start      end width
[1] 12040238 12073571 33334

Goal:
I wan to build two vectors:
(i) A vector of positions on my chromosome (having length == tothe length
of my gene)
(ii) A vector of counts over those positions

Positions with 0 counts are removed from (i) and (ii)

==== Here's my code ====

# Get the index of the reads from my sample.reads
# object that lie within my gene bounds
which.reads <- subjectHits(overlap(sample.reads, bounds))

# Slice out these reads
gene.reads <- reads[which.reads]
The above two steps could be simply:
gene.reads <- sample.reads[bounds]
    # Build my coverage Rle vector
cov <- coverage(gene.reads)

# Only keep the part of the Rle that starts @ the start
# of my gene to the end (I feel like doing this is weird (?))
cov <- cov[start(bounds):length(cov)]

# Create the non-Rle coverage vector that I will use down stream
# to make a GenomeGraph::BaseTrack object
counts <- numeric(width(bounds))

# Populate this vector with the correct coverage counts
counts[1:length(cov)] <- as.vector(cov)
As you say, this looks like it could be done easier usingarguments to
coverage(). Maybe someone else could explain this?
# Build the base-position vector and filter out positions
# with 0 counts from both vectors
keep <- counts != 0

# This is (i) from above
bases <- (start(bounds):end(bounds))[keep]
A bit shorter:
bases <- as.integer(bounds)[keep]
# This is (ii) from above
counts <- counts[keep]

=====================
I feel like the coverage interface, using the start/end argswould help
here, no? Wouldn't this have worked to make things a bit easier?

cov <- coverage(gene.reads, start=start(bounds), end=end(bounds))
How would you do that by using the width/shift args? Orwouldn't you?Maybe I'm not clear on what the original start/end args in``coverage`` are
meant to do?
Anyway, I was just curious if there was a better way to do whatI need to
do.

Thanks,

-steve
===== Here's the function in all of its glory, use it if youlike =====
Here's a sample call:

plotGenomeGraphReads(list(Sample1=reads.1, Sample2=reads.2),
 gene.id='ENSG00000116688', title="MFN2")

Here's the function:

plotGenomeGraphReads <- function(sample.reads, gene.model=NULL,
gene.id=NULL,
biomart=NULL, cols=NULL,title=NULL) {
# Plots the read counts over a particular gene model.
#
# You can pass your own gene.model as an IRange object, or usebiomaRt to
# retrieve the model using the Ensembl Gene ID gene.id
#
# Parameters
# ----------
# sample.reads : named list[IRanges] representing reads fromseparate
samples
# over a given chromosome (the one your gene ison!)# gene.model : IRanges object of start/end positions of thegene# mart : biomaRt, if you want to query ensembl for thegene model
# cols         : the colors you want to use for the reads in the
respective
#                samples
if (is.null(gene.model) && is.null(gene.id)) {
 stop("Need to get the gene.model from somewhere.")
}
# Figure out what type of gene.model we're using and getappropriate
bounds
if (!is.null(gene.model)) {
gm <- makeGeneModel(start=start(gene.model), end=end(gene.model))
 bounds <- range(gene.model)
} else {
 if (is.null(biomart)) {
   biomart <- useMart('ensembl', dataset='hsapiens_gene_ensembl')
 }
gm <- makeGene(id=gene.id, type='ensembl_gene_id',biomart=biomart)
 if (is.null(title)) {
   title <- gene.id
 }
 anno <- g...@ens
 bounds <- range(IRanges(start=anno$exon_chrom_start,
                         end=anno$exon_chrom_end))
}

sample.names <- names(sample.reads)

if (is.null(cols)) {
 cols <- rainbow(length(sample.reads))
 names(cols) <- sample.names
}

if (is.null(title)) {
 title <- 'Gene Reads'
}

stopifnot(length(cols) == length(sample.reads))
stopifnot(names(cols) == sample.names)
# Calculate the coverage vectors over the gene model for eachsample
sample.coverage <- lapply(sample.names, function(name) {
 reads <- sample.reads[[name]]
 which.reads <- subjectHits(overlap(reads, bounds))
 gene.reads <- reads[which.reads]
 cov <- coverage(gene.reads)
 cov <- cov[start(bounds):length(cov)]

 counts <- numeric(width(bounds))
 counts[1:length(cov)] <- as.vector(cov)
 keep <- counts != 0
 bases <- (start(bounds):end(bounds))[keep]
 counts <- counts[keep]
 list(pos=bases, coverage=counts)
})
names(sample.coverage) <- sample.names

# Make y-axis the same for each sample
max.counts <- max(sapply(sample.coverage, function(cov)
max(cov$coverage)))

tracks <- lapply(sample.names, function(name) {
dp <- DisplayPars(lwd=0.3, color=cols[[name]], ylim=c(0,max.counts))
 makeBaseTrack(base=sample.coverage[[name]]$pos,
               value=sample.coverage[[name]]$coverage,
               dp=dp)
})
names(tracks) <- names(sample.reads)

title <- makeTitle(text=title)
plot.me <- c(title, tracks, list(gm))

gdPlot(plot.me, minBase=start(bounds), maxBase=end(bounds))
}


--
Steve Lianoglou
Graduate Student: Computational Systems Biology
|  Memorial Sloan-Kettering Cancer Center
|  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 |  Memorial Sloan-Kettering Cancer Center
 |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact


--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] IRanges::overlap interface, help writing idiomatic code?

Reply via email to