Re: [Bioc-sig-seq] using subseq() in Biostrings to insert bases

Andrew Yee Tue, 20 Jul 2010 14:06:16 -0700

Thanks, that's very helpful.

Andrew


2010/7/20 Hervé Pagès <[email protected]>

> Hi Andrew,
>
>
> On 07/19/2010 08:11 PM, Andrew Yee wrote:
>
>> Is there a way to use subseq() to insert nucleotides?
>>
>> Take for example foo:
>>
>> foo<- DNAStringSet('ACTTA')
>>
>> I'd like to insert e.g. a G between positions 2 and 3 so that foo looks
>> like
>> 'ACGTTA'  Is there a way to do this using subseq()?  Or is an alternative
>> function recommended?
>>
>
> Just do:
>
>  > foo <- DNAString('ACTTA')
>  > subseq(foo, start=3, end=2) <- DNAString("G")
>  > foo
>    6-letter "DNAString" instance
>  seq: ACGTTA
>
> Generally speaking, if performance is important, you should get better
> results by *not* switching back and forth between DNAString/DNAStringSet
> objects and regular character vectors. For example, if you want to
> replace the nucleotides starting at pos 101 in Human chrY by those in
> chrM:
>
>  o Using subseq<-():
>
>    > library(BSgenome.Hsapiens.UCSC.hg19)
>    > chrY <- unmasked(Hsapiens$chrY)
>    > chrM <- unmasked(Hsapiens$chrM)
>    > system.time(subseq(chrY, start=101, width=length(chrM)) <- chrM)
>       user  system elapsed
>      0.190   0.010   0.193
>    > gc()
>               used  (Mb) gc trigger  (Mb) max used  (Mb)
>    Ncells  1158959  61.9    1835812  98.1  1394696  74.5
>    Vcells 15445470 117.9   17364147 132.5 15568274 118.8
>
>  o Using substr():
>
>    > library(BSgenome.Hsapiens.UCSC.hg19)
>    > chrY <- unmasked(Hsapiens$chrY)
>    > chrM <- unmasked(Hsapiens$chrM)
>    > system.time({tmp <- as.character(chrY);
>                   tmp2 <- paste(substr(tmp, start=1, stop=100),
>                                 as.character(chrM),
>                                 substr(tmp, start=101+length(chrM),
>                                             stop=length(chrY)),
>                                 sep="");
>                   chrY <- DNAString(tmp2)})
>       user  system elapsed
>      1.860   0.230   2.088
>    > gc()
>               used  (Mb) gc trigger  (Mb) max used  (Mb)
>    Ncells  1128874  60.3    1835812  98.1  1368491  73.1
>    Vcells 30276832 231.0   35483245 270.8 30284765 231.1
>
> Using subseq<-() is faster and more memory efficient. It's also
> more convenient.
>
> Cheers,
> H.
>
>
>
>
>> Thanks,
>> Andrew
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> [email protected]
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: [email protected]
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] using subseq() in Biostrings to insert bases

Reply via email to