On Thu, Jul 20, 2017 at 5:44 PM, Farid Zakaria <[email protected]>
wrote:

> Finally (sorry I keep making separate messages) --
>
> The reason why I was seeking a FdInputStream solution is because it seems
> to be much faster than an MMAP solution.
> Although my file is quite large (10GB) -- memory is not much of a concern.
>

This is very surprising. Can you show your complete code that is faster
with InputStreamMessageReader than with mmap()? Probably there is a problem
in the code that causes the difference.

How does one copy from InputStreamMessageReader into the
> MallocMessageReader ?
>

I assume you mean MallocMessageBuilder. You would do:

    bulider.setRoot(reader.getRoot<Type>());

-Kenton


>
> On Thursday, July 20, 2017 at 5:30:30 PM UTC-7, Farid Zakaria wrote:
>>
>> I had to actually store the FlatArrayMessageReader rather than the
>> Message::Reader for it to work ?
>> I think i'm not grokking why that matters -- I thought
>> FlatArrayMessageReader is just a pointer into the MMAP file.
>> Why would it matter if it cast it to the reader ?
>>
>>
>> hmm.
>>
>> On Thursday, July 20, 2017 at 5:25:00 PM UTC-7, Farid Zakaria wrote:
>>>
>>> All the items in my message array seem to be always pointing to the last
>>> item read.
>>> I'm not sure what I'm doing wrong here.
>>>
>>>
>>> auto messages = std::make_unique<std::deque<Message::Reader *> >(10);
>>>
>>> while (words.size() > 0) {
>>>     capnp::FlatArrayMessageReader * reader = new 
>>> capnp::FlatArrayMessageReader(words);
>>>     Message::Reader message = reader->getRoot<Message>();
>>>     words = kj::arrayPtr(message->getEnd(), words.end());
>>>     messages->at(index++) = & message;
>>> }
>>>
>>>
>>> On Thursday, July 20, 2017 at 4:35:29 PM UTC-7, Kenton Varda wrote:
>>>>
>>>> On Thu, Jul 20, 2017 at 3:40 PM, Farid Zakaria <[email protected]>
>>>>  wrote:
>>>>
>>>>> Is MMAP the only way to randomly seek to an offset in the file?
>>>>>
>>>>> I can't seem to find a way to do that with kj::FdInputStream ?
>>>>>
>>>>>
>>>>> I'm trying to create an index of the elements in the file.
>>>>>
>>>>
>>>> kj::InputStream doesn't assume the stream is seekable and doesn't track
>>>> the current location. You could create a custom wrapper around InputStream
>>>> or around BufferedInputStream that remembers how many bytes have been read.
>>>> You can also lseek() the underlying fd directly, though of course you'll
>>>> have to discard any buffers after that.
>>>>
>>>> But indeed, if you use mmap() this will all be a lot easier, and
>>>> faster. I highly recommend using mmap() here.
>>>>
>>>> On Thu, Jul 20, 2017 at 4:14 PM, Farid Zakaria <[email protected]>
>>>> wrote:
>>>>
>>>>> One more question =)
>>>>>
>>>>> I need to copy the root from a FdStream to a vector
>>>>> Do I need to copy it into a MallocMessageBuilder ?
>>>>>
>>>>
>>>> With InputStreamMessageReader, yes. You have to destroy the
>>>> InputStreamMessageReader before you can read the next message, and that
>>>> invalidates the root Reader and all other Readers pointing into it.
>>>>
>>>> However, with the mmap strategy, you don't need to delete the
>>>> FlatArrayMessageReader before reading the next message. So, you can
>>>> allocate them on the heap and put them into your vector, and then all the
>>>> Readers pointing into them remain valid, as long as the
>>>> FlatArrayMessageReaders exist and the memory is still mapped. (In this case
>>>> you should remove the madvise() line since you plan to go back and randomly
>>>> access the data later.)
>>>>
>>>> Again, I *highly* recommend this strategy instead of using a stream.
>>>> With the mmap strategy, not only do you avoid copying into a builder, but
>>>> you avoid copying the underlying data when you read it. The operating
>>>> system causes the memory addresses to point directly at its in-memory cache
>>>> of the file data. If multiple programs mmap() the same file, they share the
>>>> memory, rather than creating their own copies. Moreover, the operating
>>>> system is free to evict the data from memory and then load it again later
>>>> on-demand. There are tons of advantages to this approach and it is exactly
>>>> what Cap'n Proto is designed to enable.
>>>>
>>>> -Kenton
>>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> Visit this group at https://groups.google.com/group/capnproto.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
Visit this group at https://groups.google.com/group/capnproto.

Reply via email to