Hi Andrew,

On 07/19/2010 08:11 PM, Andrew Yee wrote:
Is there a way to use subseq() to insert nucleotides?

Take for example foo:

foo<- DNAStringSet('ACTTA')

I'd like to insert e.g. a G between positions 2 and 3 so that foo looks like
'ACGTTA'  Is there a way to do this using subseq()?  Or is an alternative
function recommended?

Just do:

  > foo <- DNAString('ACTTA')
  > subseq(foo, start=3, end=2) <- DNAString("G")
  > foo
    6-letter "DNAString" instance
  seq: ACGTTA

Generally speaking, if performance is important, you should get better
results by *not* switching back and forth between DNAString/DNAStringSet
objects and regular character vectors. For example, if you want to
replace the nucleotides starting at pos 101 in Human chrY by those in
chrM:

  o Using subseq<-():

    > library(BSgenome.Hsapiens.UCSC.hg19)
    > chrY <- unmasked(Hsapiens$chrY)
    > chrM <- unmasked(Hsapiens$chrM)
    > system.time(subseq(chrY, start=101, width=length(chrM)) <- chrM)
       user  system elapsed
      0.190   0.010   0.193
    > gc()
               used  (Mb) gc trigger  (Mb) max used  (Mb)
    Ncells  1158959  61.9    1835812  98.1  1394696  74.5
    Vcells 15445470 117.9   17364147 132.5 15568274 118.8

  o Using substr():

    > library(BSgenome.Hsapiens.UCSC.hg19)
    > chrY <- unmasked(Hsapiens$chrY)
    > chrM <- unmasked(Hsapiens$chrM)
    > system.time({tmp <- as.character(chrY);
                   tmp2 <- paste(substr(tmp, start=1, stop=100),
                                 as.character(chrM),
                                 substr(tmp, start=101+length(chrM),
                                             stop=length(chrY)),
                                 sep="");
                   chrY <- DNAString(tmp2)})
       user  system elapsed
      1.860   0.230   2.088
    > gc()
               used  (Mb) gc trigger  (Mb) max used  (Mb)
    Ncells  1128874  60.3    1835812  98.1  1368491  73.1
    Vcells 30276832 231.0   35483245 270.8 30284765 231.1

Using subseq<-() is faster and more memory efficient. It's also
more convenient.

Cheers,
H.



Thanks,
Andrew

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [email protected]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to