Hi all,
I'm finding it hard to shed some of my 10 thumbs when using the
IRanges/Rle classes, so can someone suggest the "correct" way
to do what my
code does below?
My goal is to use GenomeGraphs to plot a picture of my gene
model, along
with the reads from 1 or more samples above it. I've actually
got that
working, but I feel there's a more idiomatic way to do this, so
if you have
any comments, I'd be interested in hearing them.
I'll paste the function in its entirety at the end of the
email. It works
as advertised, but perhaps not done in the most efficient way.
For brevity's
sake, however, I've distilled the specific code for your
feedback.
Anyway, here's the gist:
1. sample.reads : An IRanges object of the reads aligned to a
particular
chromosome from a particular rna-seq sample:
R> > head(sample.reads)
IRanges instance:
start end width
[1] 11869 11900 32
[2] 11882 11913 32
[3] 12245 12276 32
[4] 12554 12585 32
[5] 12555 12586 32
[6] 12557 12588 32
2. bounds: An IRange object that indicates the start and end
position of
my gene -- I want all the reads that land here:
R> bounds
IRanges instance:
start end width
[1] 12040238 12073571 33334
Goal:
I wan to build two vectors:
(i) A vector of positions on my chromosome (having length == to
the length
of my gene)
(ii) A vector of counts over those positions
Positions with 0 counts are removed from (i) and (ii)
==== Here's my code ====
# Get the index of the reads from my sample.reads
# object that lie within my gene bounds
which.reads <- subjectHits(overlap(sample.reads, bounds))
# Slice out these reads
gene.reads <- reads[which.reads]
# This is (ii) from above
counts <- counts[keep]
=====================
I feel like the coverage interface, using the start/end args
would help
here, no? Wouldn't this have worked to make things a bit easier?
cov <- coverage(gene.reads, start=start(bounds), end=end(bounds))
How would you do that by using the width/shift args? Or
wouldn't you?
Maybe I'm not clear on what the original start/end args in
``coverage`` are
meant to do?
Anyway, I was just curious if there was a better way to do what
I need to
do.
Thanks,
-steve
===== Here's the function in all of its glory, use it if you
like =====
Here's a sample call:
plotGenomeGraphReads(list(Sample1=reads.1, Sample2=reads.2),
gene.id='ENSG00000116688', title="MFN2")
Here's the function:
plotGenomeGraphReads <- function(sample.reads, gene.model=NULL,
gene.id=NULL,
biomart=NULL, cols=NULL,
title=NULL) {
# Plots the read counts over a particular gene model.
#
# You can pass your own gene.model as an IRange object, or use
biomaRt to
# retrieve the model using the Ensembl Gene ID gene.id
#
# Parameters
# ----------
# sample.reads : named list[IRanges] representing reads from
separate
samples
# over a given chromosome (the one your gene is
on!)
# gene.model : IRanges object of start/end positions of the
gene
# mart : biomaRt, if you want to query ensembl for the
gene model
# cols : the colors you want to use for the reads in the
respective
# samples
if (is.null(gene.model) && is.null(gene.id)) {
stop("Need to get the gene.model from somewhere.")
}
# Figure out what type of gene.model we're using and get
appropriate
bounds
if (!is.null(gene.model)) {
gm <- makeGeneModel(start=start(gene.model), end=end
(gene.model))
bounds <- range(gene.model)
} else {
if (is.null(biomart)) {
biomart <- useMart('ensembl', dataset='hsapiens_gene_ensembl')
}
gm <- makeGene(id=gene.id, type='ensembl_gene_id',
biomart=biomart)
if (is.null(title)) {
title <- gene.id
}
anno <- g...@ens
bounds <- range(IRanges(start=anno$exon_chrom_start,
end=anno$exon_chrom_end))
}
sample.names <- names(sample.reads)
if (is.null(cols)) {
cols <- rainbow(length(sample.reads))
names(cols) <- sample.names
}
if (is.null(title)) {
title <- 'Gene Reads'
}
stopifnot(length(cols) == length(sample.reads))
stopifnot(names(cols) == sample.names)
# Calculate the coverage vectors over the gene model for each
sample
sample.coverage <- lapply(sample.names, function(name) {
reads <- sample.reads[[name]]
which.reads <- subjectHits(overlap(reads, bounds))
gene.reads <- reads[which.reads]
cov <- coverage(gene.reads)
cov <- cov[start(bounds):length(cov)]
counts <- numeric(width(bounds))
counts[1:length(cov)] <- as.vector(cov)
keep <- counts != 0
bases <- (start(bounds):end(bounds))[keep]
counts <- counts[keep]
list(pos=bases, coverage=counts)
})
names(sample.coverage) <- sample.names
# Make y-axis the same for each sample
max.counts <- max(sapply(sample.coverage, function(cov)
max(cov$coverage)))
tracks <- lapply(sample.names, function(name) {
dp <- DisplayPars(lwd=0.3, color=cols[[name]], ylim=c(0,
max.counts))
makeBaseTrack(base=sample.coverage[[name]]$pos,
value=sample.coverage[[name]]$coverage,
dp=dp)
})
names(tracks) <- names(sample.reads)
title <- makeTitle(text=title)
plot.me <- c(title, tracks, list(gm))
gdPlot(plot.me, minBase=start(bounds), maxBase=end(bounds))
}
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing