I think you can get relevant information rapidly from the dbsnp vcf. You would acquire
ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-common_all.vcf.gz ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-common_all.vcf.gz.tbi and wrap in a TabixFile > tf class: TabixFile path: 00-common_all.vcf.gz index: 00-common_all.vcf.gz.tbi isOpen: FALSE yieldSize: NA rowRanges(readVcf(tf, param=ScanVcfParam(which=GRanges("10", IRanges(1,50000))), genome="hg19")) then returns fairly quickly. Perhaps AnnotationHub can address this issue. If you have the file locally, > system.time( + rowRanges(readVcf(tf, param=ScanVcfParam(which=GRanges("10", IRanges(1,50000))), genome="hg19"))) user system elapsed 0.187 0.009 0.222 If instead you read from NCBI > tf2 = " ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-common_all.vcf.gz " > system.time( + rowRanges(readVcf(tf2, param=ScanVcfParam(which=GRanges("10", IRanges(1,50000))), genome="hg19"))) ) user system elapsed 0.237 0.055 16.476 faster than a speeding snplocs? but perhaps there is information loss or other diminished functionality On Fri, Jun 17, 2016 at 12:53 PM, Robert Castelo <robert.cast...@upf.edu> wrote: > hi, > > the performance of snpsByOverlaps() in terms of time and memory > consumption is quite poor and i wonder whether there is some bug in the > code. here's one example: > > library(GenomicRanges) > library(SNPlocs.Hsapiens.dbSNP144.GRCh37) > > snps <- SNPlocs.Hsapiens.dbSNP144.GRCh37 > > gr <- GRanges(seqnames="ch10", IRanges(123276830, 123276830)) > > system.time(ov <- snpsByOverlaps(snps, gr)) > user system elapsed > 33.768 0.124 33.955 > > system.time(ov <- snpsByOverlaps(snps, gr)) > user system elapsed > 33.150 0.281 33.494 > > > i've shown the call to snpsByOverlaps() twice to account for the fact that > maybe the first call was caching data and the second could be much faster, > but it is not the case. > > if i do the same but with a larger GRanges object, for instance the one > attached to this email, then the memory consumption grows until about 20 > Gbytes. to me this in conjunction with the previous observation, suggests > something wrong about the caching of the data. > > > > i look forward to your comments and possible solutions, > > > thanks!!! > > > robert. > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel