Re: [Bioc-sig-seq] IRanges::overlap interface, help writing idiomatic code?

Patrick Aboyoun Fri, 11 Sep 2009 10:10:15 -0700

Michael,

I was recrafting Steve's example, in which he created a numeric vectorfor the non-zero coverage values. He might need a numeric vector forGenomeGraphs.



Patrick



Michael Lawrence wrote:

On Fri, Sep 11, 2009 at 9:51 AM, Patrick Aboyoun <[email protected]<mailto:[email protected]>> wrote:


    Steve,
    I have revised your code to simplify it a bit. I added example
    data as well to make it easier for others to run it. The shift and
    width arguments to coverage are discussed in the coverage man
    page. If you are unsure what the sift argument will do to a set of
    ranges, you can run the shift operation on them (e.g.
    shift(bounds, shift = - start(bounds) + 1)).


    ==== Revised code ====

    suppressMessages(library(IRanges))

    sample.reads <- IRanges(start = c(11869, 11882, 12245, 12554,
    12555, 12557), width = 32)
    bounds <- IRanges(start=11700, end=12550)

    # Get the the reads from my sample.reads

    # object that lie within my gene bounds
    gene.reads <- sample.reads[bounds]

    # Build my coverage Rle vector
    cov <- coverage(gene.reads, shift = - start(bounds) + 1, width =
    width(bounds))


    # Create the non-Rle coverage vector that I will use down stream
    # to make a GenomeGraph::BaseTrack object
    counts <- as.numeric(cov)


Would as.integer() be sufficient here?


    # Build the base-position vector and filter out positions
    # with 0 counts from both vectors
    keep <- counts != 0

    # This is (i) from above
    bases <- as.integer(bounds)[keep]

    # This is (ii) from above
    counts <- counts[keep]

    =====================


    > sessionInfo()
    R version 2.10.0 Under development (unstable) (2009-09-07 r49613)
    i386-apple-darwin9.8.0

    locale:
    [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

    attached base packages:

[1] stats graphics grDevices utils datasets methodsbaseother attached packages:

    [1] IRanges_1.3.72

    loaded via a namespace (and not attached):
    [1] tools_2.10.0





    Steve Lianoglou wrote:

        Hi Michael,

        Thanks for your suggestions so far ... being able to slice a Range
        object w/ another Range object somehow fell off of my radar.

        -steve

        On Thu, Sep 10, 2009 at 6:25 PM, Michael
        Lawrence<[email protected] <mailto:[email protected]>> wrote:

            On Thu, Sep 10, 2009 at 1:55 PM, Steve Lianoglou
            <[email protected]
            <mailto:[email protected]>> wrote:

                Hi all,

                I'm finding it hard to shed some of my 10 thumbs when
                using the
                IRanges/Rle classes, so can someone suggest the
                "correct" way to do what my
                code does below?

                My goal is to use GenomeGraphs to plot a picture of my
                gene model, along
                with the reads from 1 or more samples above it. I've
                actually got that
                working, but I feel there's a more idiomatic way to do
                this, so if you have
                any comments, I'd be interested in hearing them.

                I'll paste the function in its entirety at the end of
                the email. It works
                as advertised, but perhaps not done in the most
                efficient way. For brevity's
                sake, however, I've distilled the specific code for
                your feedback.

                Anyway, here's the gist:

                1. sample.reads : An IRanges object of the reads
                aligned to a particular
                chromosome from a particular rna-seq sample:

                R> > head(sample.reads)
                IRanges instance:
                  start   end width
                [1] 11869 11900    32
                [2] 11882 11913    32
                [3] 12245 12276    32
                [4] 12554 12585    32
                [5] 12555 12586    32
                [6] 12557 12588    32

                2. bounds: An IRange object that indicates the start
                and end position of
                my gene -- I want all the reads that land here:

                R> bounds
                IRanges instance:
                     start      end width
                [1] 12040238 12073571 33334

                Goal:
                I wan to build two vectors:
                (i) A vector of positions on my chromosome (having
                length == to the length
                of my gene)
                (ii) A vector of counts over those positions

                Positions with 0 counts are removed from (i) and (ii)

                ==== Here's my code ====

                # Get the index of the reads from my sample.reads
                # object that lie within my gene bounds
                which.reads <- subjectHits(overlap(sample.reads, bounds))

                # Slice out these reads
                gene.reads <- reads[which.reads]

            The above two steps could be simply:

                gene.reads <- sample.reads[bounds]
                     # Build my coverage Rle vector
                cov <- coverage(gene.reads)

                # Only keep the part of the Rle that starts @ the start
                # of my gene to the end (I feel like doing this is
                weird (?))
                cov <- cov[start(bounds):length(cov)]

                # Create the non-Rle coverage vector that I will use
                down stream
                # to make a GenomeGraph::BaseTrack object
                counts <- numeric(width(bounds))

                # Populate this vector with the correct coverage counts
                counts[1:length(cov)] <- as.vector(cov)

            As you say, this looks like it could be done easier using
            arguments to
            coverage(). Maybe someone else could explain this?

                # Build the base-position vector and filter out positions
                # with 0 counts from both vectors
                keep <- counts != 0

                # This is (i) from above
                bases <- (start(bounds):end(bounds))[keep]

            A bit shorter:
            bases <- as.integer(bounds)[keep]

                # This is (ii) from above
                counts <- counts[keep]

                =====================

                I feel like the coverage interface, using the
                start/end args would help
                here, no? Wouldn't this have worked to make things a
                bit easier?

                cov <- coverage(gene.reads, start=start(bounds),
                end=end(bounds))

                How would you do that by using the width/shift args?
                Or wouldn't you?
                Maybe I'm not clear on what the original start/end
                args in ``coverage`` are
                meant to do?

                Anyway, I was just curious if there was a better way
                to do what I need to
                do.

                Thanks,

                -steve


                ===== Here's the function in all of its glory, use it
                if you like =====

                Here's a sample call:

                plotGenomeGraphReads(list(Sample1=reads.1,
                Sample2=reads.2),
                  gene.id <http://gene.id>='ENSG00000116688',
                title="MFN2")

                Here's the function:

                plotGenomeGraphReads <- function(sample.reads,
                gene.model=NULL,
                gene.id <http://gene.id>=NULL,
                                               biomart=NULL,
                cols=NULL, title=NULL) {
                 # Plots the read counts over a particular gene model.
                 #
                 # You can pass your own gene.model as an IRange
                object, or use biomaRt to
                 # retrieve the model using the Ensembl Gene ID
                gene.id <http://gene.id>
                 #
                 # Parameters
                 # ----------
                 # sample.reads : named list[IRanges] representing
                reads from separate
                samples
                 #                over a given chromosome (the one
                your gene is on!)
                 # gene.model   : IRanges object of start/end
                positions of the gene
                 # mart         : biomaRt, if you want to query
                ensembl for the gene model
                 # cols         : the colors you want to use for the
                reads in the
                respective
                 #                samples
                 if (is.null(gene.model) && is.null(gene.id
                <http://gene.id>)) {
                  stop("Need to get the gene.model from somewhere.")
                 }

                 # Figure out what type of gene.model we're using and
                get appropriate
                bounds
                 if (!is.null(gene.model)) {
                  gm <- makeGeneModel(start=start(gene.model),
                end=end(gene.model))
                  bounds <- range(gene.model)
                 } else {
                  if (is.null(biomart)) {
                    biomart <- useMart('ensembl',
                dataset='hsapiens_gene_ensembl')
                  }
                  gm <- makeGene(id=gene.id <http://gene.id>,
                type='ensembl_gene_id', biomart=biomart)
                  if (is.null(title)) {
                    title <- gene.id <http://gene.id>
                  }
                  anno <- g...@ens
                  bounds <- range(IRanges(start=anno$exon_chrom_start,
                                          end=anno$exon_chrom_end))
                 }

                 sample.names <- names(sample.reads)

                 if (is.null(cols)) {
                  cols <- rainbow(length(sample.reads))
                  names(cols) <- sample.names
                 }

                 if (is.null(title)) {
                  title <- 'Gene Reads'
                 }

                 stopifnot(length(cols) == length(sample.reads))
                 stopifnot(names(cols) == sample.names)

                 # Calculate the coverage vectors over the gene model
                for each sample
                 sample.coverage <- lapply(sample.names, function(name) {
                  reads <- sample.reads[[name]]
                  which.reads <- subjectHits(overlap(reads, bounds))
                  gene.reads <- reads[which.reads]
                  cov <- coverage(gene.reads)
                  cov <- cov[start(bounds):length(cov)]

                  counts <- numeric(width(bounds))
                  counts[1:length(cov)] <- as.vector(cov)
                  keep <- counts != 0
                  bases <- (start(bounds):end(bounds))[keep]
                  counts <- counts[keep]
                  list(pos=bases, coverage=counts)
                 })
                 names(sample.coverage) <- sample.names

                 # Make y-axis the same for each sample
                 max.counts <- max(sapply(sample.coverage, function(cov)
                max(cov$coverage)))

                 tracks <- lapply(sample.names, function(name) {
                  dp <- DisplayPars(lwd=0.3, color=cols[[name]],
                ylim=c(0, max.counts))
                  makeBaseTrack(base=sample.coverage[[name]]$pos,
                                value=sample.coverage[[name]]$coverage,
                                dp=dp)
                 })
                 names(tracks) <- names(sample.reads)

                 title <- makeTitle(text=title)
                 plot.me <- c(title, tracks, list(gm))

                 gdPlot(plot.me, minBase=start(bounds),
                maxBase=end(bounds))
                }


                --
                Steve Lianoglou
                Graduate Student: Computational Systems Biology
                 |  Memorial Sloan-Kettering Cancer Center
                 |  Weill Medical College of Cornell University
                Contact Info: http://cbio.mskcc.org/~lianos/contact
                <http://cbio.mskcc.org/%7Elianos/contact>

                _______________________________________________
                Bioc-sig-sequencing mailing list
                [email protected]
                <mailto:[email protected]>
                https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] IRanges::overlap interface, help writing idiomatic code?

Reply via email to