I agree that the use case is not that pressing, and we should defer this
for now.


On Tue, Jul 1, 2014 at 12:04 PM, Hervé Pagès <hpa...@fhcrc.org> wrote:

> On 07/01/2014 10:38 AM, Michael Lawrence wrote:
>
>> The difference of course being that you implemented those trees from
>> scratch, while we're relying on the Kent library for the low-level
>> management of the tree. We would probably need to break from the Kent
>> library to pursue this approach.
>>
>
> I see. That makes things a little bit more complicated. I wonder if the
> whole effort is worth it given that serialization of a GIntervalTree
> doesn't seem like a common use case and that re-processing the
> GIntervalTree from the GRanges object maybe doesn't take that much
> time (I didn't do any timings to back this up though). For PDict
> objects it was nice to be able to serialize them even though it's
> probably not something the user should do. Turning a DNAStringSet
> object into a PDict object is very fast and the resulting object is
> so big that a save/load cycle would actually take much longer than
> re-processing the PDict object at each new session.
>
> Also my feeling that the time and effort required to break from the Kent
> would perhaps be better spent trying to implement something new like the
> Nested Containment List algo. Since this would probably have to be
> implemented from scratch anyway then it would make sense to use
> SEXP-based memory, or even better, to put a thin abstract layer between
> the algo itself and memory management so they are decoupled.
>
> Cheers,
> H.
>
>
>>
>> On Tue, Jul 1, 2014 at 9:05 AM, Hervé Pagès <hpa...@fhcrc.org
>> <mailto:hpa...@fhcrc.org>> wrote:
>>
>>     Hi Hector, Michael,
>>
>>
>>     On 07/01/2014 05:57 AM, Michael Lawrence wrote:
>>
>>         It seems tough to make this work. There is no way for the R
>>         serialization
>>         machinery to understand what needs to be serialized after the
>>         external
>>         pointer. The easiest approach to fixing this would be to
>> reimplement
>>         everything on top of SEXPs, which is to say, it would not be easy.
>>
>>
>>     This is what I did with PDict objects to store the Aho-Corasick tree.
>>     It's actually easier than it sounds. You can use any atomic type, say
>>     INTSXP or RAWSXP, it doesn't matter, That's just a way to get memory.
>>     Then you do what you want with it (thru casting the pointer to it).
>>     It not only solves the serialization problem, it also automatically
>>     manages the memory, which is now in the hands of the garbage
>> collector.
>>
>>     Cheers,
>>     H.
>>
>>         Alternatively, we could write our own serializer. It seems R
>>         needs a way to
>>         register (de)serializers for external pointers.
>>
>>
>>         On Tue, Jul 1, 2014 at 5:37 AM, Hector Corrada Bravo
>>         <hcorr...@gmail.com <mailto:hcorr...@gmail.com>>
>>
>>         wrote:
>>
>>             Confirmed. Will look into it now.
>>             Thanks for writing!
>>             Hector
>>
>>
>>             On Tue, Jul 1, 2014 at 2:40 AM, Kristoffer Vitting-Seerup <
>>             kristoffer.vittingseerup@bio.__ku.dk
>>
>>             <mailto:kristoffer.vittingsee...@bio.ku.dk>> wrote:
>>
>>                 Hi bioc-devel
>>
>>                 I’ve fond an error in the usage of GIntervalTree:
>>
>>
>>                     test <- GRanges(seqnames='Chr1',
>>                     range=IRanges(start=10,end=20)__)
>>
>>                     test
>>
>>                 GRanges with 1 range and 0 metadata columns:
>>                         seqnames    ranges strand
>>                            <Rle> <IRanges>  <Rle>
>>                     [1]     Chr1  [10, 20]      *
>>
>>                 this object I can save and load without problem:
>>
>>                 save(test, file='test.Rdata')
>>
>>                     rm(test)
>>                     load('test.Rdata')
>>                     test
>>
>>                 GRanges with 1 range and 0 metadata columns:
>>                         seqnames    ranges strand
>>                            <Rle> <IRanges>  <Rle>
>>                     [1]     Chr1  [10, 20]      *
>>
>>
>>                 But if I convert to to a GIntervalTree (for faster
>>                 overlap finding) I get
>>                 a fatal error when loading:
>>
>>                 test2 <- GIntervalTree(test)
>>
>>                     test2
>>
>>                 GIntervalTree with 1 range and 0 metadata columns:
>>                         seqnames    ranges strand
>>                            <Rle> <IRanges>  <Rle>
>>                     [1]     Chr1  [10, 20]      *
>>
>>                     save(test2, file='test2.Rdata')
>>                     rm(test2)
>>                     load('test2.Rdata')
>>                     test2
>>
>>                 GIntervalTree with 1 range and 0 metadata columns:
>>
>>                    *** caught segfault ***
>>                 address 0xc, cause 'memory not mapped'
>>
>>                 Traceback:
>>                    1: .Call(.NAME, ..., PACKAGE = PACKAGE)
>>                    2: .Call2(fun, object@ptr, ..., PACKAGE = "IRanges")
>>                    3: .IntervalForestCall(from, "asIRanges")
>>                    4: asMethod(object)
>>                    5: as(x@ranges, "IRanges")
>>                    6: .GT_reorderValue(x, as(x@ranges, "IRanges"))
>>                    7: .local(x, ...)
>>                    8: ranges(x)
>>                    9: ranges(x)
>>
>>                 Possible actions:
>>                 1: abort (with core dump, if enabled)
>>                 2: normal R exit
>>                 3: exit R without saving workspace
>>                 4: exit R saving workspace
>>
>>
>>                 My session info:
>>                 sessionInfo()
>>                 R version 3.1.0 (2014-04-10)
>>                 Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>
>>                 locale:
>>                 [1] C
>>
>>                 attached base packages:
>>                 [1] grDevices datasets  grid      parallel  stats
>>                 graphics  utils
>>                 methods   base
>>
>>                 other attached packages:
>>                    [1] spliceR_1.5.0         plyr_1.8.1
>>                   RColorBrewer_1.0-5
>>                    VennDiagram_1.6.5     cummeRbund_2.7.1      Gviz_1.9.4
>>                    rtracklayer_1.25.8    GenomicRanges_1.17.14
>>                 GenomeInfoDb_1.1.5
>>                    IRanges_1.99.13
>>                 [11] S4Vectors_0.0.6       fastcluster_1.1.13
>>                   reshape2_1.4
>>                    ggplot2_0.9.3.1       RSQLite_0.11.4        DBI_0.2-7
>>                 BiocGenerics_0.11.2
>>
>>                 loaded via a namespace (and not attached):
>>                    [1] AnnotationDbi_1.27.6     BBmisc_1.6
>>                 BSgenome_1.33.5
>>                        BatchJobs_1.2            Biobase_2.25.0
>>                 BiocParallel_0.7.0
>>                       Biostrings_2.33.8        Formula_1.1-1
>>                    GenomicAlignments_1.1.10
>>                 [10] GenomicFeatures_1.17.6   Hmisc_3.14-4
>>                 MASS_7.3-33
>>                        R.methodsS3_1.6.1        RCurl_1.95-4.1
>>                 Rcpp_0.11.1
>>                        Rsamtools_1.17.14        VariantAnnotation_1.11.5
>>                 XML_3.98-1.1
>>                 [19] XVector_0.5.6            biomaRt_2.21.0
>>                 biovizBase_1.13.7
>>                        bitops_1.0-6             brew_1.0-6
>>                 cluster_1.15.2
>>                       codetools_0.2-8          colorspace_1.2-4
>>                 dichromat_2.0-0
>>                 [28] digest_0.6.4             fail_1.2
>>                 foreach_1.4.2
>>                        gtable_0.1.2             iterators_1.0.7
>>                   lattice_0.20-29
>>                        latticeExtra_0.6-26      matrixStats_0.8.14
>>                 munsell_0.4.2
>>                 [37] proto_0.3-10             scales_0.2.4
>>                 sendmailR_1.1-2
>>                        splines_3.1.0            stats4_3.1.0
>>                 stringr_0.6.2
>>                        survival_2.37-7          tools_3.1.0
>>                   zlibbioc_1.11.1
>>
>>
>>
>>                 --
>>                 Kindest regards
>>                 Kristoffer Vitting-Seerup, cand.scient. (M.Sc.),
>>                 Ph.D Fellow
>>                 Sandelin Group
>>
>>                 Bioinformatics Centre | Biotech Research & Innovation
>>                 Centre (BRIC), Dep.
>>                 Of Biology
>>                 University of Copenhagen
>>                 Building 1, 3th floor, office 3 (1-3-03)
>>                 Ole Maaløes Vej 5
>>
>>                 DK-2200 Copenhagen N
>>                 Denmark
>>                 http://binf.ku.dk | http://www.bric.ku.dk
>>
>>
>>
>>
>>
>>
>>
>>                           [[alternative HTML version deleted]]
>>
>>
>>                 _________________________________________________
>>                 Bioc-devel@r-project.org
>>                 <mailto:Bioc-devel@r-project.org> mailing list
>>                 https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>>
>>                 <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>>
>>
>>                       [[alternative HTML version deleted]]
>>
>>
>>             _________________________________________________
>>             Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
>>             mailing list
>>             https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>>
>>             <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>>
>>
>>                  [[alternative HTML version deleted]]
>>
>>
>>
>>         _________________________________________________
>>         Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
>>         mailing list
>>         https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>>
>>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>>
>>     --
>>     Hervé Pagès
>>
>>     Program in Computational Biology
>>     Division of Public Health Sciences
>>     Fred Hutchinson Cancer Research Center
>>     1100 Fairview Ave. N, M1-B514
>>     P.O. Box 19024
>>     Seattle, WA 98109-1024
>>
>>     E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
>>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to