On Thu, Jun 24, 2010 at 10:31 PM, Dario Strbenac <[email protected]>wrote:
> Hello, > > I have a question about what is the most efficient way to perform my use > case. > > What I have done is gotten a matchMatrix from an overlapping, then split it > : > > regionSiteMap <- findOverlaps(regions, sites)@matchMatrix > indexList <- split(regionSiteMap[, "subject"], regionSiteMap[, "query"]) > > Instead of splitting, get the scores and query hits into an Rle: ol <- findOverlaps(regions, sites) srle <- Rle(scoreVec[subjectHits(ol)]) qrle <- Rle(queryHits(ol)) The Rle compression may not be appropriate for your scores, but now you can use the query Rle to define Views on the score Rle: v <- Views(srle, as(qrle, "IRanges")) Now all the view methods are at your disposal, like viewMeans(): means <- viewMeans(v) Michael > Now I'd like to, for each region, use the indices to the sites to get the > sites' scores from a vector and take the mean, like : > > means <- sapply(indicesList, function(indices) mean(scoreVect[indices])) > > The problem about this is that I have ~ 8 million 'regions', and ~ 28 > million 'sites'. So the indexList is a list of ~ 8 million elements with a > few indices in each one, and scoresVect is a numeric vector of scores of > length ~ 28 million. > > Can anyone suggest what is the fastest way to go on this task ? > > -------------------------------------- > Dario Strbenac > Research Assistant > Cancer Epigenetics > Garvan Institute of Medical Research > Darlinghurst NSW 2010 > Australia > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
