On 12/09/2013 10:50 AM, Kasper Daniel Hansen wrote:
I agree with Michael. I don't think we want to deprive ourselves of good
approaches by a need for supporting Windows.  Especially in a case like
this where on-disc representation is optional.

I agree mmap is appealing. I just didn't want to have to depend on
it in XVector, which is at the bottom of the package stack. For now my
focus/interest is more on the OnDiskVector concept/API. Specific storage
back-ends can be implemented as concrete subclasses. There are already
2 of them (DirectRaw and SerializedRaw). Others can be added for mmap
and HDF5 for example. They don't necessarily have to be implemented in
XVector.

H.


Kasper


On Mon, Dec 9, 2013 at 1:46 PM, Michael Lawrence
<lawrence.mich...@gene.com <mailto:lawrence.mich...@gene.com>> wrote:

    On Mon, Dec 9, 2013 at 9:30 AM, Hervé Pagès <hpa...@fhcrc.org
    <mailto:hpa...@fhcrc.org>> wrote:

     > On 12/09/2013 05:39 AM, Michael Lawrence wrote:
     >
     >> Any thoughts about using mmap(), so that SharedRaw and OnDiskRaw
    just
     >> operate on a pointer as the abstraction?
     >>
     >
     > Martin mentioned mmap to me for this project but I had some concerns
     > about Windows compatibility. Are there CRAN or BioC packages that use
     > it? Would be interesting to have a look at them.
     >

    bigmemory is a CRAN package, and it is extended by bigmemoryExtras in
    Bioconductor.

    No Windows version available, of course. But seriously, who uses
    Windows to
    crunch data? Easy enough to fallback to the in-memory implementation.



     > H.
     >
     >
     >> Michael
     >>
     >>
     >> On Sun, Dec 8, 2013 at 11:39 PM, Hervé Pagès <hpa...@fhcrc.org
    <mailto:hpa...@fhcrc.org>
     >> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>> wrote:
     >>
     >>     Hi Michael,
     >>
     >>     The OnDiskXRaw virtual class (if this is what you're
    referring to)
     >>     is still a very early work-in-progress. The idea is to
    experiment
     >>     with on-disk representation of atomic vectors and direct
    random access
     >>     to subsequences of the vector. The exact storage mode is
    implemented
     >> by
     >>     concrete subclasses (currently only DirectRaw and
    SerializedRaw).
     >>     OnDiskXRaw is actually analog to SharedRaw except that with
    the latter
     >>     the "shared" sequence of bytes resides in memory.
     >>
     >>     If we had "on-disk" support for all atomic vectors, it
    sounds like it
     >>     would then be easy to support "on-disk" versions of higher-level
     >>     objects like IRanges or GRanges. They would be defined as their
     >>     "in-memory" counterpart except that the slots that are
    atomic vectors
     >>     in the "in-memory" version would just need to be replaced by
    "on-disk"
     >>     atomic vectors. "On-disk" versions of DNAString (and even
     >> DNAStringSet)
     >>     objects could also easily be implemented e.g. by just making the
     >>     "shared" slot an OnDiskXRaw object instead of a SharedRaw
    object.
     >>
     >>     Putting SharedRaw and OnDiskXRaw under the same umbrella
    (i.e. under
     >>     a virtual class) and using that virtual class to specify the
    slot of
     >>     higher-level objects like DNAString is tempting but
    realistically we
     >>     don't operate on an on-disk object like we do on an
    in-memory object.
     >>
     >>     Having an "on-disk" version of DNAString with direct random
    access was
     >>     in fact the initial motivation for OnDiskXRaw. The use case
    for this
     >>     was to support direct random access in BSgenome objects
    without having
     >>     to change the way the chromosomes are stored on disk
    (they're stored
     >>     as serialized raw vectors). I've finally implemented this
    feature
     >> (will
     >>     soon be pushed to BioC devel) but I changed the storage and
    didn't use
     >>     OnDiskXRaw in the end.
     >>
     >>     H.
     >>
     >>
     >>
     >>     On 12/05/2013 06:43 AM, Michael Lawrence wrote:
     >>
     >>         A nice goal for the XVector package would be full
    implementation
     >>         of the R
     >>         vector API on top of the already existing memory-sharing
    (rather
     >>         than
     >>         memory-duplicating) data structures. The actual storage
    mode of
     >>         the data
     >>         should be obviously be abstracted, e.g., on-disk should be
     >>         treated the same
     >>         as the externalptr representation. Much of the
    implementation
     >>         will need to
     >>         be in C, unless we want to pay the price of extracting
    things
     >>         into ordinary
     >>         R vectors. Should the abstraction be therefore dropped
    down to
     >>         the C level,
     >>         so that the implementations can more easily share from each
     >>         other? Anything
     >>         to gain here from the externalVector package?
     >>
     >>                  [[alternative HTML version deleted]]
     >>
     >>         _________________________________________________
     >> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
    <mailto:Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>>
     >>         mailing list
     >> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
     >>
     >>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
     >>
     >>
     >>     --
     >>     Hervé Pagès
     >>
     >>     Program in Computational Biology
     >>     Division of Public Health Sciences
     >>     Fred Hutchinson Cancer Research Center
     >>     1100 Fairview Ave. N, M1-B514
     >>     P.O. Box 19024
     >>     Seattle, WA 98109-1024
     >>
     >>     E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
    <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>
     >>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
    <tel:%28206%29%20667-5791>
     >>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
    <tel:%28206%29%20667-1319>
     >>
     >>
     >>
     > --
     > Hervé Pagès
     >
     > Program in Computational Biology
     > Division of Public Health Sciences
     > Fred Hutchinson Cancer Research Center
     > 1100 Fairview Ave. N, M1-B514
     > P.O. Box 19024
     > Seattle, WA 98109-1024
     >
     > E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
     > Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
     > Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
     >

             [[alternative HTML version deleted]]


    _______________________________________________
    Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to