I agree that the use case is not that pressing, and we should defer this for now.
On Tue, Jul 1, 2014 at 12:04 PM, Hervé Pagès <hpa...@fhcrc.org> wrote: > On 07/01/2014 10:38 AM, Michael Lawrence wrote: > >> The difference of course being that you implemented those trees from >> scratch, while we're relying on the Kent library for the low-level >> management of the tree. We would probably need to break from the Kent >> library to pursue this approach. >> > > I see. That makes things a little bit more complicated. I wonder if the > whole effort is worth it given that serialization of a GIntervalTree > doesn't seem like a common use case and that re-processing the > GIntervalTree from the GRanges object maybe doesn't take that much > time (I didn't do any timings to back this up though). For PDict > objects it was nice to be able to serialize them even though it's > probably not something the user should do. Turning a DNAStringSet > object into a PDict object is very fast and the resulting object is > so big that a save/load cycle would actually take much longer than > re-processing the PDict object at each new session. > > Also my feeling that the time and effort required to break from the Kent > would perhaps be better spent trying to implement something new like the > Nested Containment List algo. Since this would probably have to be > implemented from scratch anyway then it would make sense to use > SEXP-based memory, or even better, to put a thin abstract layer between > the algo itself and memory management so they are decoupled. > > Cheers, > H. > > >> >> On Tue, Jul 1, 2014 at 9:05 AM, Hervé Pagès <hpa...@fhcrc.org >> <mailto:hpa...@fhcrc.org>> wrote: >> >> Hi Hector, Michael, >> >> >> On 07/01/2014 05:57 AM, Michael Lawrence wrote: >> >> It seems tough to make this work. There is no way for the R >> serialization >> machinery to understand what needs to be serialized after the >> external >> pointer. The easiest approach to fixing this would be to >> reimplement >> everything on top of SEXPs, which is to say, it would not be easy. >> >> >> This is what I did with PDict objects to store the Aho-Corasick tree. >> It's actually easier than it sounds. You can use any atomic type, say >> INTSXP or RAWSXP, it doesn't matter, That's just a way to get memory. >> Then you do what you want with it (thru casting the pointer to it). >> It not only solves the serialization problem, it also automatically >> manages the memory, which is now in the hands of the garbage >> collector. >> >> Cheers, >> H. >> >> Alternatively, we could write our own serializer. It seems R >> needs a way to >> register (de)serializers for external pointers. >> >> >> On Tue, Jul 1, 2014 at 5:37 AM, Hector Corrada Bravo >> <hcorr...@gmail.com <mailto:hcorr...@gmail.com>> >> >> wrote: >> >> Confirmed. Will look into it now. >> Thanks for writing! >> Hector >> >> >> On Tue, Jul 1, 2014 at 2:40 AM, Kristoffer Vitting-Seerup < >> kristoffer.vittingseerup@bio.__ku.dk >> >> <mailto:kristoffer.vittingsee...@bio.ku.dk>> wrote: >> >> Hi bioc-devel >> >> Iââ¬â¢ve fond an error in the usage of GIntervalTree: >> >> >> test <- GRanges(seqnames='Chr1', >> range=IRanges(start=10,end=20)__) >> >> test >> >> GRanges with 1 range and 0 metadata columns: >> seqnames ranges strand >> <Rle> <IRanges> <Rle> >> [1] Chr1 [10, 20] * >> >> this object I can save and load without problem: >> >> save(test, file='test.Rdata') >> >> rm(test) >> load('test.Rdata') >> test >> >> GRanges with 1 range and 0 metadata columns: >> seqnames ranges strand >> <Rle> <IRanges> <Rle> >> [1] Chr1 [10, 20] * >> >> >> But if I convert to to a GIntervalTree (for faster >> overlap finding) I get >> a fatal error when loading: >> >> test2 <- GIntervalTree(test) >> >> test2 >> >> GIntervalTree with 1 range and 0 metadata columns: >> seqnames ranges strand >> <Rle> <IRanges> <Rle> >> [1] Chr1 [10, 20] * >> >> save(test2, file='test2.Rdata') >> rm(test2) >> load('test2.Rdata') >> test2 >> >> GIntervalTree with 1 range and 0 metadata columns: >> >> *** caught segfault *** >> address 0xc, cause 'memory not mapped' >> >> Traceback: >> 1: .Call(.NAME, ..., PACKAGE = PACKAGE) >> 2: .Call2(fun, object@ptr, ..., PACKAGE = "IRanges") >> 3: .IntervalForestCall(from, "asIRanges") >> 4: asMethod(object) >> 5: as(x@ranges, "IRanges") >> 6: .GT_reorderValue(x, as(x@ranges, "IRanges")) >> 7: .local(x, ...) >> 8: ranges(x) >> 9: ranges(x) >> >> Possible actions: >> 1: abort (with core dump, if enabled) >> 2: normal R exit >> 3: exit R without saving workspace >> 4: exit R saving workspace >> >> >> My session info: >> sessionInfo() >> R version 3.1.0 (2014-04-10) >> Platform: x86_64-apple-darwin10.8.0 (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] grDevices datasets grid parallel stats >> graphics utils >> methods base >> >> other attached packages: >> [1] spliceR_1.5.0 plyr_1.8.1 >> RColorBrewer_1.0-5 >> VennDiagram_1.6.5 cummeRbund_2.7.1 Gviz_1.9.4 >> rtracklayer_1.25.8 GenomicRanges_1.17.14 >> GenomeInfoDb_1.1.5 >> IRanges_1.99.13 >> [11] S4Vectors_0.0.6 fastcluster_1.1.13 >> reshape2_1.4 >> ggplot2_0.9.3.1 RSQLite_0.11.4 DBI_0.2-7 >> BiocGenerics_0.11.2 >> >> loaded via a namespace (and not attached): >> [1] AnnotationDbi_1.27.6 BBmisc_1.6 >> BSgenome_1.33.5 >> BatchJobs_1.2 Biobase_2.25.0 >> BiocParallel_0.7.0 >> Biostrings_2.33.8 Formula_1.1-1 >> GenomicAlignments_1.1.10 >> [10] GenomicFeatures_1.17.6 Hmisc_3.14-4 >> MASS_7.3-33 >> R.methodsS3_1.6.1 RCurl_1.95-4.1 >> Rcpp_0.11.1 >> Rsamtools_1.17.14 VariantAnnotation_1.11.5 >> XML_3.98-1.1 >> [19] XVector_0.5.6 biomaRt_2.21.0 >> biovizBase_1.13.7 >> bitops_1.0-6 brew_1.0-6 >> cluster_1.15.2 >> codetools_0.2-8 colorspace_1.2-4 >> dichromat_2.0-0 >> [28] digest_0.6.4 fail_1.2 >> foreach_1.4.2 >> gtable_0.1.2 iterators_1.0.7 >> lattice_0.20-29 >> latticeExtra_0.6-26 matrixStats_0.8.14 >> munsell_0.4.2 >> [37] proto_0.3-10 scales_0.2.4 >> sendmailR_1.1-2 >> splines_3.1.0 stats4_3.1.0 >> stringr_0.6.2 >> survival_2.37-7 tools_3.1.0 >> zlibbioc_1.11.1 >> >> >> >> -- >> Kindest regards >> Kristoffer Vitting-Seerup, cand.scient. (M.Sc.), >> Ph.D Fellow >> Sandelin Group >> >> Bioinformatics Centre | Biotech Research & Innovation >> Centre (BRIC), Dep. >> Of Biology >> University of Copenhagen >> Building 1, 3th floor, office 3 (1-3-03) >> Ole Maaløes Vej 5 >> >> DK-2200 Copenhagen N >> Denmark >> http://binf.ku.dk | http://www.bric.ku.dk >> >> >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> >> _________________________________________________ >> Bioc-devel@r-project.org >> <mailto:Bioc-devel@r-project.org> mailing list >> https://stat.ethz.ch/mailman/__listinfo/bioc-devel >> >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel> >> >> >> >> [[alternative HTML version deleted]] >> >> >> _________________________________________________ >> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> >> mailing list >> https://stat.ethz.ch/mailman/__listinfo/bioc-devel >> >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel> >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> _________________________________________________ >> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> >> mailing list >> https://stat.ethz.ch/mailman/__listinfo/bioc-devel >> >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel> >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> >> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >> >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel