Hi Maarten,

identical() is not reliable on DNAStringSet objects or other objects
that contain external pointers as it can return false negatives as well
as false positives. We'll fix the "cbind" and "rbind" methods for
SummarizedExperiment to work around this problem.

Thanks for the report.

H.

On 04/03/2017 12:58 AM, Maarten van Iterson wrote:
Dear list,

Combining SummarizedExperiment object, containing a DNAStringSet in the
rowData seems not to work properly. If I cbind two SummarizedExperiment
objects, which I know are identical, an error is reported:

Error in FUN(X[[i]], ...) (from #2) :
  column(s) 'sourceSeq' in ‘mcols’ are duplicated and the data do not match

I think I traced the problem existing in `SummarizedExperiment:::.compare`
in that `identical` is used to compare DNAStringSets which is not behaving
as expected. Whereas it should return all identical it returns it is not!

Here is a counter example (which was easier to construct) showing that
`identical` returns FALSE where it should return TRUE.

library(Biostrings)
seq1 <- paste(DNA_BASES[sample(1:4,5,replace=T)], collapse="")
seq2 <- paste(DNA_BASES[sample(1:4,5,replace=T)], collapse="")

seq1
[1] "GACTC"
seq2
[1] "GAATG"

s1 <- DNAStringSet(seq1)
s2 <- DNAStringSet(seq2)

str(s1)
Formal class 'DNAStringSet' [package "Biostrings"] with 5 slots
  ..@ pool           :Formal class 'SharedRaw_Pool' [package "XVector"]
with 2 slots
  .. .. ..@ xp_list                    :List of 1
  .. .. .. ..$ :<externalptr>
  .. .. ..@ .link_to_cached_object_list:List of 1
  .. .. .. ..$ :<environment:0x71f94d0>
  ..@ ranges         :Formal class 'GroupedIRanges' [package "XVector"]
with 7 slots
  .. .. ..@ group          : int 1
  .. .. ..@ start          : int 1
  .. .. ..@ width          : int 5
  .. .. ..@ NAMES          : NULL
  .. .. ..@ elementType    : chr "integer"
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ elementType    : chr "DNAString"
  ..@ elementMetadata: NULL
  ..@ metadata       : list()
str(s2)
Formal class 'DNAStringSet' [package "Biostrings"] with 5 slots
  ..@ pool           :Formal class 'SharedRaw_Pool' [package "XVector"]
with 2 slots
  .. .. ..@ xp_list                    :List of 1
  .. .. .. ..$ :<externalptr>
  .. .. ..@ .link_to_cached_object_list:List of 1
  .. .. .. ..$ :<environment:0x71f94d0>

  ..@ ranges         :Formal class 'GroupedIRanges' [package "XVector"]
with 7 slots
  .. .. ..@ group          : int 1
  .. .. ..@ start          : int 1
  .. .. ..@ width          : int 5
  .. .. ..@ NAMES          : NULL
  .. .. ..@ elementType    : chr "integer"
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ elementType    : chr "DNAString"
  ..@ elementMetadata: NULL
  ..@ metadata       : list()

identical(seq1, seq2)
[1] FALSE
identical(s1, s2)
[1] TRUE
seq1 == seq2
[1] FALSE
s1 == s2
[1] FALSE

sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8
 [7] LC_PAPER=en_US.utf8       LC_NAME=C
 [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] Biostrings_2.42.1          XVector_0.14.1
 [3] BBMRIomics_1.0.3           SummarizedExperiment_1.4.0
 [5] Biobase_2.34.0             GenomicRanges_1.26.4
 [7] GenomeInfoDb_1.10.3        IRanges_2.8.2
 [9] S4Vectors_0.12.2           BiocGenerics_0.20.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10             AnnotationDbi_1.36.2
hms_0.3
 [4] GenomicAlignments_1.10.1 zlibbioc_1.20.0
BiocParallel_1.8.1
 [7] BSgenome_1.42.0          lattice_0.20-35
R6_2.2.0
[10] httr_1.2.1               tools_3.3.2
grid_3.3.2
[13] DBI_0.6                  assertthat_0.1
digest_0.6.12
[16] tibble_1.2               Matrix_1.2-8
readr_1.1.0
[19] rtracklayer_1.34.2       bitops_1.0-6
biomaRt_2.30.0
[22] RCurl_1.95-4.8           memoise_1.0.0
RSQLite_1.1-2
[25] compiler_3.3.2           GenomicFeatures_1.26.3
Rsamtools_1.26.1
[28] XML_3.98-1.5             jsonlite_1.3
VariantAnnotation_1.20.3


I don't completely understand understand why `identical` is not working
properly is it comparing the environment address in the above example they
are the same although the sequences are not? In my case the two
SummarizedExperiments contained the same DNAStringSets but had a different
environment address?

Regards,
Maarten

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=uvrEDLijSOFICTEXtDWEcJQxpbdIH_JLue85P1KkRSk&s=CiJ40v8p658EEANnkQUiSwzWFnU_9gbt3urmC3CXn5g&e=


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to