Hello, I have a question about what is the most efficient way to perform my use case.
What I have done is gotten a matchMatrix from an overlapping, then split it : regionSiteMap <- findOverlaps(regions, sites)@matchMatrix indexList <- split(regionSiteMap[, "subject"], regionSiteMap[, "query"]) Now I'd like to, for each region, use the indices to the sites to get the sites' scores from a vector and take the mean, like : means <- sapply(indicesList, function(indices) mean(scoreVect[indices])) The problem about this is that I have ~ 8 million 'regions', and ~ 28 million 'sites'. So the indexList is a list of ~ 8 million elements with a few indices in each one, and scoresVect is a numeric vector of scores of length ~ 28 million. Can anyone suggest what is the fastest way to go on this task ? -------------------------------------- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
