hi,

the performance of snpsByOverlaps() in terms of time and memory consumption is quite poor and i wonder whether there is some bug in the code. here's one example:

library(GenomicRanges)
library(SNPlocs.Hsapiens.dbSNP144.GRCh37)

snps <- SNPlocs.Hsapiens.dbSNP144.GRCh37

gr <- GRanges(seqnames="ch10", IRanges(123276830, 123276830))

system.time(ov <- snpsByOverlaps(snps, gr))
   user  system elapsed
 33.768   0.124  33.955

system.time(ov <- snpsByOverlaps(snps, gr))
   user  system elapsed
 33.150   0.281  33.494


i've shown the call to snpsByOverlaps() twice to account for the fact that maybe the first call was caching data and the second could be much faster, but it is not the case.

if i do the same but with a larger GRanges object, for instance the one attached to this email, then the memory consumption grows until about 20 Gbytes. to me this in conjunction with the previous observation, suggests something wrong about the caching of the data.



i look forward to your comments and possible solutions,


thanks!!!


robert.
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to