Hi Björn, The easiest way to make this work would be to have the data set live on a network filesystem (e.g. NFS) or block device (e.g. NBD, iSCSI) which you can mount on your local system and then use mmap().
If mounting a remote filesystem is not an option, it is technically possible to do everything in userspace instead -- but it's tricky. Essentially, you can implement a memory mapping entirely in userspace by writing your own signal handler for SIGSEGV. At startup, you would create an anonymous memory mapping that is at least the size of your remote file, and is marked to prohibit reading. When your program attempts to read from this space, a SIGSEGV signal is raised. In your signal handler, you look at what address the code was trying to access (from si_addr in the siginfo_t), you fetch the appropriate page from the remote server, you map that page into the right place in local memory, and then you mark it as readable. On return from the signal handler, the code continues on with the newly-mapped data. This is, of course, pretty advanced systems hacking, an unfortunately I don't know of a library that does it for you (though I bet one exists... somewhere). Otherwise, you need to spit your data into smaller pieces that your application knows how to fetch explicitly as needed... -Kenton On Thu, Aug 16, 2018 at 10:32 AM, <[email protected]> wrote: > Hi, > > I'm investigating using Cap'n Proto as the basis for a format containing a > large collection of r-tree indexed data. The typical access pattern would > be to query the index resulting in a set of nodes in the tree. The > collection of data would be physically clustered on node indices so that > one can efficiently seek and read the data items for the searched node > indexes. > > The recommendations for random access has been to simply use mmap which I > assume would work well in this case but AFAIK it's something that is only > used for files readily available on attached block storage. However, in > this case the full dataset might very well be too large to keep locally > and the preferred access method would be streaming access over network with > the same pattern of random access using index searches. > > I'm a C++ novice and I fail to understand if something remotely like this > can be done already with the reference C++ implementation. Indeed, I have > not even been able to understand if it supports sequential streaming access > of a part of a message - it seems assumed that a message is fully read into > RAM, except when using mmap which would then be the only way to partially > read a message (sequential or random). But I do not want to give up yet, > perhaps there is something I'm missing? > > Regards, > > Björn > > -- > You received this message because you are subscribed to the Google Groups > "Cap'n Proto" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > Visit this group at https://groups.google.com/group/capnproto. > -- You received this message because you are subscribed to the Google Groups "Cap'n Proto" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. Visit this group at https://groups.google.com/group/capnproto.
