Hi Thomas,

The internals of the XStringSet container have changed in BioC 2.5
in order to support bigger objects (i.e. objects that can have more
than 2^31 letters in them, now this limit is 2^31 letters per element
and the maximum nb of elements is 2^31, very much like for
standard character vectors) and also to support more efficient
combining thru c() or append() (this is now achieved with no copying
of the sequence data). The fact that reverseComplement(), reverse(), complement() and chartr() are currently broken on XStringSet objects that have gone thru combining is because of this change in the internals. Most methods that operate on XStringSet objects were adapted
except those 4 methods because of lack of time. I'm working on this
right now and will post again here when it's fixed. Thanks for the
reminder and sorry for the inconvenience.

Cheers,
H.


Thomas Girke wrote:
Dear List,

Is there an explanation for the behavior change of XStringSet
objects that have gone through an append() or c() step and those
that didn't? I am not observing this problem in the previous R/BioC release.

Below is a simple example to reproduce this error.

Thanks in advance for your help.

Thomas

## Example
library(Biostrings)
dset1 <- DNAStringSet(c("GCATATTAC", "AATCGATCC", "GCATATTAC"))
dset2 <- DNAStringSet(c("CCGCATATTAC", "AAAATCGATCC", "GCATATAATAC"))
dset3 <- c(dset1, dset2) # using append() doesn't fix the problem

reverseComplement(dset3)
Error in .local(x, ...) : IRanges internal error: length(x) != 1

DNAStringSet(dset3, start=1, end=4)
Error in super(x) : Biostrings internal error: length(x...@pool) != 1

## The problem goes away by doing the following
dset3fix <- DNAStringSet(unlist(strsplit(toString(dset3), ", ")))

reverseComplement(dset3fix)
  A DNAStringSet instance of length 6
    width seq
[1]     9 GTAATATGC
[2]     9 GGATCGATT
[3]     9 GTAATATGC
[4]    11 GTAATATGCGG
[5]    11 GGATCGATTTT
[6]    11 GTATTATATGC


DNAStringSet(dset3fix, start=1, end=4)
  A DNAStringSet instance of length 6
    width seq
[1]     4 GCAT
[2]     4 AATC
[3]     4 GCAT
[4]     4 CCGC
[5]     4 AAAA
[6]     4 GCAT


sessionInfo()
R version 2.10.0 (2009-10-26)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8  
      LC_COLLATE=en_US.UTF-8     LC_MONETARY=C              
LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biostrings_2.14.1 IRanges_1.4.3

loaded via a namespace (and not attached):
[1] Biobase_2.6.0

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [email protected]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to