Thanks for looking into this! Maarten
On Mon, Apr 3, 2017 at 7:00 PM, Hervé Pagès <hpa...@fredhutch.org> wrote: > Hi Maarten, > > identical() is not reliable on DNAStringSet objects or other objects > that contain external pointers as it can return false negatives as well > as false positives. We'll fix the "cbind" and "rbind" methods for > SummarizedExperiment to work around this problem. > > Thanks for the report. > > H. > > > On 04/03/2017 12:58 AM, Maarten van Iterson wrote: > >> Dear list, >> >> Combining SummarizedExperiment object, containing a DNAStringSet in the >> rowData seems not to work properly. If I cbind two SummarizedExperiment >> objects, which I know are identical, an error is reported: >> >> Error in FUN(X[[i]], ...) (from #2) : >> column(s) 'sourceSeq' in ‘mcols’ are duplicated and the data do not >> match >> >> I think I traced the problem existing in `SummarizedExperiment:::.compa >> re` >> in that `identical` is used to compare DNAStringSets which is not behaving >> as expected. Whereas it should return all identical it returns it is not! >> >> Here is a counter example (which was easier to construct) showing that >> `identical` returns FALSE where it should return TRUE. >> >> library(Biostrings) >>> seq1 <- paste(DNA_BASES[sample(1:4,5,replace=T)], collapse="") >>> seq2 <- paste(DNA_BASES[sample(1:4,5,replace=T)], collapse="") >>> >>> seq1 >>> >> [1] "GACTC" >> >>> seq2 >>> >> [1] "GAATG" >> >>> >>> s1 <- DNAStringSet(seq1) >>> s2 <- DNAStringSet(seq2) >>> >>> str(s1) >>> >> Formal class 'DNAStringSet' [package "Biostrings"] with 5 slots >> ..@ pool :Formal class 'SharedRaw_Pool' [package "XVector"] >> with 2 slots >> .. .. ..@ xp_list :List of 1 >> .. .. .. ..$ :<externalptr> >> .. .. ..@ .link_to_cached_object_list:List of 1 >> .. .. .. ..$ :<environment:0x71f94d0> >> ..@ ranges :Formal class 'GroupedIRanges' [package "XVector"] >> with 7 slots >> .. .. ..@ group : int 1 >> .. .. ..@ start : int 1 >> .. .. ..@ width : int 5 >> .. .. ..@ NAMES : NULL >> .. .. ..@ elementType : chr "integer" >> .. .. ..@ elementMetadata: NULL >> .. .. ..@ metadata : list() >> ..@ elementType : chr "DNAString" >> ..@ elementMetadata: NULL >> ..@ metadata : list() >> >>> str(s2) >>> >> Formal class 'DNAStringSet' [package "Biostrings"] with 5 slots >> ..@ pool :Formal class 'SharedRaw_Pool' [package "XVector"] >> with 2 slots >> .. .. ..@ xp_list :List of 1 >> .. .. .. ..$ :<externalptr> >> .. .. ..@ .link_to_cached_object_list:List of 1 >> .. .. .. ..$ :<environment:0x71f94d0> >> >> ..@ ranges :Formal class 'GroupedIRanges' [package "XVector"] >> with 7 slots >> .. .. ..@ group : int 1 >> .. .. ..@ start : int 1 >> .. .. ..@ width : int 5 >> .. .. ..@ NAMES : NULL >> .. .. ..@ elementType : chr "integer" >> .. .. ..@ elementMetadata: NULL >> .. .. ..@ metadata : list() >> ..@ elementType : chr "DNAString" >> ..@ elementMetadata: NULL >> ..@ metadata : list() >> >>> >>> identical(seq1, seq2) >>> >> [1] FALSE >> >>> identical(s1, s2) >>> >> [1] TRUE >> >>> seq1 == seq2 >>> >> [1] FALSE >> >>> s1 == s2 >>> >> [1] FALSE >> >>> >>> sessionInfo() >>> >> R version 3.3.2 (2016-10-31) >> Platform: x86_64-pc-linux-gnu (64-bit) >> Running under: Ubuntu 16.04.2 LTS >> >> locale: >> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C >> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 >> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 >> [7] LC_PAPER=en_US.utf8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats4 stats graphics grDevices utils datasets >> [8] methods base >> >> other attached packages: >> [1] Biostrings_2.42.1 XVector_0.14.1 >> [3] BBMRIomics_1.0.3 SummarizedExperiment_1.4.0 >> [5] Biobase_2.34.0 GenomicRanges_1.26.4 >> [7] GenomeInfoDb_1.10.3 IRanges_2.8.2 >> [9] S4Vectors_0.12.2 BiocGenerics_0.20.0 >> >> loaded via a namespace (and not attached): >> [1] Rcpp_0.12.10 AnnotationDbi_1.36.2 >> hms_0.3 >> [4] GenomicAlignments_1.10.1 zlibbioc_1.20.0 >> BiocParallel_1.8.1 >> [7] BSgenome_1.42.0 lattice_0.20-35 >> R6_2.2.0 >> [10] httr_1.2.1 tools_3.3.2 >> grid_3.3.2 >> [13] DBI_0.6 assertthat_0.1 >> digest_0.6.12 >> [16] tibble_1.2 Matrix_1.2-8 >> readr_1.1.0 >> [19] rtracklayer_1.34.2 bitops_1.0-6 >> biomaRt_2.30.0 >> [22] RCurl_1.95-4.8 memoise_1.0.0 >> RSQLite_1.1-2 >> [25] compiler_3.3.2 GenomicFeatures_1.26.3 >> Rsamtools_1.26.1 >> [28] XML_3.98-1.5 jsonlite_1.3 >> VariantAnnotation_1.20.3 >> >>> >>> >> I don't completely understand understand why `identical` is not working >> properly is it comparing the environment address in the above example they >> are the same although the sequences are not? In my case the two >> SummarizedExperiments contained the same DNAStringSets but had a different >> environment address? >> >> Regards, >> Maarten >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.et >> hz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt >> 84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=uv >> rEDLijSOFICTEXtDWEcJQxpbdIH_JLue85P1KkRSk&s=CiJ40v8p658EEANn >> kQUiSwzWFnU_9gbt3urmC3CXn5g&e= >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fredhutch.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel