Hi Andrew,
On 07/19/2010 08:11 PM, Andrew Yee wrote:
Is there a way to use subseq() to insert nucleotides?
Take for example foo:
foo<- DNAStringSet('ACTTA')
I'd like to insert e.g. a G between positions 2 and 3 so that foo looks like
'ACGTTA' Is there a way to do this using subseq()? Or is an alternative
function recommended?
Just do:
> foo <- DNAString('ACTTA')
> subseq(foo, start=3, end=2) <- DNAString("G")
> foo
6-letter "DNAString" instance
seq: ACGTTA
Generally speaking, if performance is important, you should get better
results by *not* switching back and forth between DNAString/DNAStringSet
objects and regular character vectors. For example, if you want to
replace the nucleotides starting at pos 101 in Human chrY by those in
chrM:
o Using subseq<-():
> library(BSgenome.Hsapiens.UCSC.hg19)
> chrY <- unmasked(Hsapiens$chrY)
> chrM <- unmasked(Hsapiens$chrM)
> system.time(subseq(chrY, start=101, width=length(chrM)) <- chrM)
user system elapsed
0.190 0.010 0.193
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1158959 61.9 1835812 98.1 1394696 74.5
Vcells 15445470 117.9 17364147 132.5 15568274 118.8
o Using substr():
> library(BSgenome.Hsapiens.UCSC.hg19)
> chrY <- unmasked(Hsapiens$chrY)
> chrM <- unmasked(Hsapiens$chrM)
> system.time({tmp <- as.character(chrY);
tmp2 <- paste(substr(tmp, start=1, stop=100),
as.character(chrM),
substr(tmp, start=101+length(chrM),
stop=length(chrY)),
sep="");
chrY <- DNAString(tmp2)})
user system elapsed
1.860 0.230 2.088
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1128874 60.3 1835812 98.1 1368491 73.1
Vcells 30276832 231.0 35483245 270.8 30284765 231.1
Using subseq<-() is faster and more memory efficient. It's also
more convenient.
Cheers,
H.
Thanks,
Andrew
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: [email protected]
Phone: (206) 667-5791
Fax: (206) 667-1319
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing