On Fri, Sep 11, 2009 at 9:51 AM, Patrick Aboyoun <[email protected]> wrote:
> Steve, > I have revised your code to simplify it a bit. I added example data as well > to make it easier for others to run it. The shift and width arguments to > coverage are discussed in the coverage man page. If you are unsure what the > sift argument will do to a set of ranges, you can run the shift operation on > them (e.g. shift(bounds, shift = - start(bounds) + 1)). > > > ==== Revised code ==== > > suppressMessages(library(IRanges)) > > sample.reads <- IRanges(start = c(11869, 11882, 12245, 12554, 12555, > 12557), width = 32) > bounds <- IRanges(start=11700, end=12550) > > # Get the the reads from my sample.reads > # object that lie within my gene bounds > gene.reads <- sample.reads[bounds] > > # Build my coverage Rle vector > cov <- coverage(gene.reads, shift = - start(bounds) + 1, width = > width(bounds)) > > # Create the non-Rle coverage vector that I will use down stream > # to make a GenomeGraph::BaseTrack object > counts <- as.numeric(cov) > Would as.integer() be sufficient here? > > # Build the base-position vector and filter out positions > # with 0 counts from both vectors > keep <- counts != 0 > > # This is (i) from above > bases <- as.integer(bounds)[keep] > > # This is (ii) from above > counts <- counts[keep] > > ===================== > > > > sessionInfo() > R version 2.10.0 Under development (unstable) (2009-09-07 r49613) > i386-apple-darwin9.8.0 > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > other attached packages: > [1] IRanges_1.3.72 > > loaded via a namespace (and not attached): > [1] tools_2.10.0 > > > > > > Steve Lianoglou wrote: > >> Hi Michael, >> >> Thanks for your suggestions so far ... being able to slice a Range >> object w/ another Range object somehow fell off of my radar. >> >> -steve >> >> On Thu, Sep 10, 2009 at 6:25 PM, Michael Lawrence<[email protected]> >> wrote: >> >> >>> On Thu, Sep 10, 2009 at 1:55 PM, Steve Lianoglou >>> <[email protected]> wrote: >>> >>> >>>> Hi all, >>>> >>>> I'm finding it hard to shed some of my 10 thumbs when using the >>>> IRanges/Rle classes, so can someone suggest the "correct" way to do what >>>> my >>>> code does below? >>>> >>>> My goal is to use GenomeGraphs to plot a picture of my gene model, along >>>> with the reads from 1 or more samples above it. I've actually got that >>>> working, but I feel there's a more idiomatic way to do this, so if you >>>> have >>>> any comments, I'd be interested in hearing them. >>>> >>>> I'll paste the function in its entirety at the end of the email. It >>>> works >>>> as advertised, but perhaps not done in the most efficient way. For >>>> brevity's >>>> sake, however, I've distilled the specific code for your feedback. >>>> >>>> Anyway, here's the gist: >>>> >>>> 1. sample.reads : An IRanges object of the reads aligned to a particular >>>> chromosome from a particular rna-seq sample: >>>> >>>> R> > head(sample.reads) >>>> IRanges instance: >>>> start end width >>>> [1] 11869 11900 32 >>>> [2] 11882 11913 32 >>>> [3] 12245 12276 32 >>>> [4] 12554 12585 32 >>>> [5] 12555 12586 32 >>>> [6] 12557 12588 32 >>>> >>>> 2. bounds: An IRange object that indicates the start and end position of >>>> my gene -- I want all the reads that land here: >>>> >>>> R> bounds >>>> IRanges instance: >>>> start end width >>>> [1] 12040238 12073571 33334 >>>> >>>> Goal: >>>> I wan to build two vectors: >>>> (i) A vector of positions on my chromosome (having length == to the >>>> length >>>> of my gene) >>>> (ii) A vector of counts over those positions >>>> >>>> Positions with 0 counts are removed from (i) and (ii) >>>> >>>> ==== Here's my code ==== >>>> >>>> # Get the index of the reads from my sample.reads >>>> # object that lie within my gene bounds >>>> which.reads <- subjectHits(overlap(sample.reads, bounds)) >>>> >>>> # Slice out these reads >>>> gene.reads <- reads[which.reads] >>>> >>>> >>>> >>> The above two steps could be simply: >>> >>> >>>> gene.reads <- sample.reads[bounds] >>>> # Build my coverage Rle vector >>>> cov <- coverage(gene.reads) >>>> >>>> # Only keep the part of the Rle that starts @ the start >>>> # of my gene to the end (I feel like doing this is weird (?)) >>>> cov <- cov[start(bounds):length(cov)] >>>> >>>> # Create the non-Rle coverage vector that I will use down stream >>>> # to make a GenomeGraph::BaseTrack object >>>> counts <- numeric(width(bounds)) >>>> >>>> # Populate this vector with the correct coverage counts >>>> counts[1:length(cov)] <- as.vector(cov) >>>> >>>> >>>> >>> As you say, this looks like it could be done easier using arguments to >>> coverage(). Maybe someone else could explain this? >>> >>> >>> >>>> # Build the base-position vector and filter out positions >>>> # with 0 counts from both vectors >>>> keep <- counts != 0 >>>> >>>> # This is (i) from above >>>> bases <- (start(bounds):end(bounds))[keep] >>>> >>>> >>>> >>> A bit shorter: >>> bases <- as.integer(bounds)[keep] >>> >>> >>> >>>> # This is (ii) from above >>>> counts <- counts[keep] >>>> >>>> ===================== >>>> >>>> I feel like the coverage interface, using the start/end args would help >>>> here, no? Wouldn't this have worked to make things a bit easier? >>>> >>>> cov <- coverage(gene.reads, start=start(bounds), end=end(bounds)) >>>> >>>> How would you do that by using the width/shift args? Or wouldn't you? >>>> Maybe I'm not clear on what the original start/end args in ``coverage`` >>>> are >>>> meant to do? >>>> >>>> Anyway, I was just curious if there was a better way to do what I need >>>> to >>>> do. >>>> >>>> Thanks, >>>> >>>> -steve >>>> >>>> >>>> ===== Here's the function in all of its glory, use it if you like ===== >>>> >>>> Here's a sample call: >>>> >>>> plotGenomeGraphReads(list(Sample1=reads.1, Sample2=reads.2), >>>> gene.id='ENSG00000116688', title="MFN2") >>>> >>>> Here's the function: >>>> >>>> plotGenomeGraphReads <- function(sample.reads, gene.model=NULL, >>>> gene.id=NULL, >>>> biomart=NULL, cols=NULL, title=NULL) { >>>> # Plots the read counts over a particular gene model. >>>> # >>>> # You can pass your own gene.model as an IRange object, or use biomaRt >>>> to >>>> # retrieve the model using the Ensembl Gene ID gene.id >>>> # >>>> # Parameters >>>> # ---------- >>>> # sample.reads : named list[IRanges] representing reads from separate >>>> samples >>>> # over a given chromosome (the one your gene is on!) >>>> # gene.model : IRanges object of start/end positions of the gene >>>> # mart : biomaRt, if you want to query ensembl for the gene >>>> model >>>> # cols : the colors you want to use for the reads in the >>>> respective >>>> # samples >>>> if (is.null(gene.model) && is.null(gene.id)) { >>>> stop("Need to get the gene.model from somewhere.") >>>> } >>>> >>>> # Figure out what type of gene.model we're using and get appropriate >>>> bounds >>>> if (!is.null(gene.model)) { >>>> gm <- makeGeneModel(start=start(gene.model), end=end(gene.model)) >>>> bounds <- range(gene.model) >>>> } else { >>>> if (is.null(biomart)) { >>>> biomart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') >>>> } >>>> gm <- makeGene(id=gene.id, type='ensembl_gene_id', biomart=biomart) >>>> if (is.null(title)) { >>>> title <- gene.id >>>> } >>>> anno <- g...@ens >>>> bounds <- range(IRanges(start=anno$exon_chrom_start, >>>> end=anno$exon_chrom_end)) >>>> } >>>> >>>> sample.names <- names(sample.reads) >>>> >>>> if (is.null(cols)) { >>>> cols <- rainbow(length(sample.reads)) >>>> names(cols) <- sample.names >>>> } >>>> >>>> if (is.null(title)) { >>>> title <- 'Gene Reads' >>>> } >>>> >>>> stopifnot(length(cols) == length(sample.reads)) >>>> stopifnot(names(cols) == sample.names) >>>> >>>> # Calculate the coverage vectors over the gene model for each sample >>>> sample.coverage <- lapply(sample.names, function(name) { >>>> reads <- sample.reads[[name]] >>>> which.reads <- subjectHits(overlap(reads, bounds)) >>>> gene.reads <- reads[which.reads] >>>> cov <- coverage(gene.reads) >>>> cov <- cov[start(bounds):length(cov)] >>>> >>>> counts <- numeric(width(bounds)) >>>> counts[1:length(cov)] <- as.vector(cov) >>>> keep <- counts != 0 >>>> bases <- (start(bounds):end(bounds))[keep] >>>> counts <- counts[keep] >>>> list(pos=bases, coverage=counts) >>>> }) >>>> names(sample.coverage) <- sample.names >>>> >>>> # Make y-axis the same for each sample >>>> max.counts <- max(sapply(sample.coverage, function(cov) >>>> max(cov$coverage))) >>>> >>>> tracks <- lapply(sample.names, function(name) { >>>> dp <- DisplayPars(lwd=0.3, color=cols[[name]], ylim=c(0, max.counts)) >>>> makeBaseTrack(base=sample.coverage[[name]]$pos, >>>> value=sample.coverage[[name]]$coverage, >>>> dp=dp) >>>> }) >>>> names(tracks) <- names(sample.reads) >>>> >>>> title <- makeTitle(text=title) >>>> plot.me <- c(title, tracks, list(gm)) >>>> >>>> gdPlot(plot.me, minBase=start(bounds), maxBase=end(bounds)) >>>> } >>>> >>>> >>>> -- >>>> Steve Lianoglou >>>> Graduate Student: Computational Systems Biology >>>> | Memorial Sloan-Kettering Cancer Center >>>> | Weill Medical College of Cornell University >>>> Contact Info: >>>> http://cbio.mskcc.org/~lianos/contact<http://cbio.mskcc.org/%7Elianos/contact> >>>> >>>> _______________________________________________ >>>> Bioc-sig-sequencing mailing list >>>> [email protected] >>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >>>> >>>> >>> >>> >> >> >> >> >> > > [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
