On Thu, Jun 24, 2010 at 10:31 PM, Dario Strbenac
<[email protected]>wrote:

> Hello,
>
> I have a question about what is the most efficient way to perform my use
> case.
>
> What I have done is gotten a matchMatrix from an overlapping, then split it
> :
>
> regionSiteMap <- findOverlaps(regions, sites)@matchMatrix
> indexList <- split(regionSiteMap[, "subject"], regionSiteMap[, "query"])
>
>
Instead of splitting, get the scores and query hits into an Rle:

ol <- findOverlaps(regions, sites)
srle <- Rle(scoreVec[subjectHits(ol)])
qrle <- Rle(queryHits(ol))

The Rle compression may not be appropriate for your scores, but now you can
use the query Rle to define Views on the score Rle:

v <- Views(srle, as(qrle, "IRanges"))

Now all the view methods are at your disposal, like viewMeans():

means <- viewMeans(v)

Michael


> Now I'd like to, for each region, use the indices to the sites to get the
> sites' scores from a vector and take the mean, like :
>
> means <- sapply(indicesList, function(indices) mean(scoreVect[indices]))
>
> The problem about this is that I have ~ 8 million 'regions', and ~ 28
> million 'sites'. So the indexList is a list of ~ 8 million elements with a
> few indices in each one, and scoresVect is a numeric vector of scores of
> length ~ 28 million.
>
> Can anyone suggest what is the fastest way to go on this task ?
>
> --------------------------------------
> Dario Strbenac
> Research Assistant
> Cancer Epigenetics
> Garvan Institute of Medical Research
> Darlinghurst NSW 2010
> Australia
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to