Thanks, that's very helpful. Andrew
2010/7/20 Hervé Pagès <[email protected]> > Hi Andrew, > > > On 07/19/2010 08:11 PM, Andrew Yee wrote: > >> Is there a way to use subseq() to insert nucleotides? >> >> Take for example foo: >> >> foo<- DNAStringSet('ACTTA') >> >> I'd like to insert e.g. a G between positions 2 and 3 so that foo looks >> like >> 'ACGTTA' Is there a way to do this using subseq()? Or is an alternative >> function recommended? >> > > Just do: > > > foo <- DNAString('ACTTA') > > subseq(foo, start=3, end=2) <- DNAString("G") > > foo > 6-letter "DNAString" instance > seq: ACGTTA > > Generally speaking, if performance is important, you should get better > results by *not* switching back and forth between DNAString/DNAStringSet > objects and regular character vectors. For example, if you want to > replace the nucleotides starting at pos 101 in Human chrY by those in > chrM: > > o Using subseq<-(): > > > library(BSgenome.Hsapiens.UCSC.hg19) > > chrY <- unmasked(Hsapiens$chrY) > > chrM <- unmasked(Hsapiens$chrM) > > system.time(subseq(chrY, start=101, width=length(chrM)) <- chrM) > user system elapsed > 0.190 0.010 0.193 > > gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 1158959 61.9 1835812 98.1 1394696 74.5 > Vcells 15445470 117.9 17364147 132.5 15568274 118.8 > > o Using substr(): > > > library(BSgenome.Hsapiens.UCSC.hg19) > > chrY <- unmasked(Hsapiens$chrY) > > chrM <- unmasked(Hsapiens$chrM) > > system.time({tmp <- as.character(chrY); > tmp2 <- paste(substr(tmp, start=1, stop=100), > as.character(chrM), > substr(tmp, start=101+length(chrM), > stop=length(chrY)), > sep=""); > chrY <- DNAString(tmp2)}) > user system elapsed > 1.860 0.230 2.088 > > gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 1128874 60.3 1835812 98.1 1368491 73.1 > Vcells 30276832 231.0 35483245 270.8 30284765 231.1 > > Using subseq<-() is faster and more memory efficient. It's also > more convenient. > > Cheers, > H. > > > > >> Thanks, >> Andrew >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >> > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M2-B876 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: [email protected] > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
_______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
