Hi Pete, Thanks for suggesting this fast method. I've formalized this a little bit by using a generic (identicalVals) + methods. I also tweaked it in order to avoid false negatives that can occur when 'x' and 'y' have different names or different seqlevels. So no more fallback to 'all(x == y)'.
Committed in SummarizedExperiment 1.3.82. BTW please note that 'x == y' and 'identicalVals(x, y)' both ignore circularity of the underlying sequences e.g. ranges [1, 10] and [101, 110] represent the same position on a circular sequence of length 100 so should be considered equal. However for 'x == y' and 'identicalVals(x, y)', they are not. Something we should address at some point... Cheers, H. On 08/30/2016 05:57 AM, Peter Hickey wrote:
The cbind,SummarizedExperiment-method checks that the rowRanges slots are equal by calling `all(x == x1)`, where x and x1 are GenomicRanges objects. This can be kind of slow and makes a large, temporary vector when length(x) is large. I wrote a fast method to check equality of two GenomicRanges objects, see https://gist.github.com/PeteHaitch/13787125a165928e652dcfea2a8d166a. It takes it from 13.7 seconds to 0.004 seconds for a GenomicRanges object with 100M elements on my machine. It uses identical() on key slots of the GenomicRanges objects, and I'm not sure if this could return false negatives, so I fall back to all(x == x1) if the fast method returns FALSE. Could cbind,SummarizedExperiment-method be updated to use something like this? Cheers, Pete _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel