Thanks for this suggestion. It is super fast ! - Dario.
---- Original message ---- >Date: Fri, 25 Jun 2010 23:11:25 -0700 >From: Hervé Pagès <[email protected]> >Subject: Re: [Bioc-sig-seq] Finding Mean Value of Overlapping Ranges >To: [email protected] >Cc: Michael Lawrence <[email protected]>, >[email protected] > >Hi Dario, > >You can try to use 'successiveIRanges(runLength(qrle))' instead >of 'as(qrle, "IRanges")'. > >Cheers, >H. > > >On 06/25/2010 01:05 AM, Dario Strbenac wrote: >> That's a neat and elegant idea, but it's not actually possible to do the >> following part >> >> as(qrle, "IRanges") >> >> Error in asMethod(object) : >> cannot coerce a non-logical 'Rle' or a logical 'Rle' with NAs to an >> IRanges object >> >> Thanks, >> Dario. >> >> >> ---- Original message ---- >>> Date: Thu, 24 Jun 2010 23:53:08 -0700 >>> From: Michael Lawrence<[email protected]> >>> Subject: Re: [Bioc-sig-seq] Finding Mean Value of Overlapping Ranges >>> To: [email protected] >>> Cc: [email protected] >>> >>> On Thu, Jun 24, 2010 at 10:31 PM, Dario Strbenac >>> <[email protected]> wrote: >>> >>> Hello, >>> >>> I have a question about what is the most efficient >>> way to perform my use case. >>> >>> What I have done is gotten a matchMatrix from an >>> overlapping, then split it : >>> >>> regionSiteMap<- findOverlaps(regions, >>> sites)@matchMatrix >>> indexList<- split(regionSiteMap[, "subject"], >>> regionSiteMap[, "query"]) >>> >>> Instead of splitting, get the scores and query hits >>> into an Rle: >>> >>> ol<- findOverlaps(regions, sites) >>> srle<- Rle(scoreVec[subjectHits(ol)]) >>> qrle<- Rle(queryHits(ol)) >>> >>> The Rle compression may not be appropriate for your >>> scores, but now you can use the query Rle to define >>> Views on the score Rle: >>> >>> v<- Views(srle, as(qrle, "IRanges")) >>> >>> Now all the view methods are at your disposal, like >>> viewMeans(): >>> >>> means<- viewMeans(v) >>> >>> Michael >>> >>> >>> Now I'd like to, for each region, use the indices >>> to the sites to get the sites' scores from a >>> vector and take the mean, like : >>> >>> means<- sapply(indicesList, function(indices) >>> mean(scoreVect[indices])) >>> >>> The problem about this is that I have ~ 8 million >>> 'regions', and ~ 28 million 'sites'. So the >>> indexList is a list of ~ 8 million elements with a >>> few indices in each one, and scoresVect is a >>> numeric vector of scores of length ~ 28 million. >>> >>> Can anyone suggest what is the fastest way to go >>> on this task ? >>> >>> -------------------------------------- >>> Dario Strbenac >>> Research Assistant >>> Cancer Epigenetics >>> Garvan Institute of Medical Research >>> Darlinghurst NSW 2010 >>> Australia >>> >>> _______________________________________________ >>> Bioc-sig-sequencing mailing list >>> [email protected] >>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >> >> >> -------------------------------------- >> Dario Strbenac >> Research Assistant >> Cancer Epigenetics >> Garvan Institute of Medical Research >> Darlinghurst NSW 2010 >> Australia >> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > -------------------------------------- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
