Hi Kenton

Thanks so much for your detailed answer.

On Wednesday, June 21, 2023 at 11:32:39 PM UTC+2 [email protected] 
wrote:

> Hi Adrian,
>
> The memory usage you are seeing happens whether or not you use mmap, it's 
> just accounted differently. If you read the file using many small read() 
> calls, the operating system will still load all of the pages of the file 
> into memory, and will only remove them from memory when the memory is used 
> for something else. That's called caching. But when you use read(), the 
> memory isn't attached directly to your program, it's just in kernel space, 
> so it doesn't look like your program is using a lot of memory, even though 
> it is.
>
> But using memory this way is not really consuming it. The memory is still 
> available for anything else that needs it. Since the memory is still 
> available, it's incorrect to think of it the same as memory your program 
> allocated for private use.
>
> Put simply, your program is not using the memory you think it is. You need 
> to understand what the numbers actually mean.
>
> -Kenton
>
> On Wed, Jun 21, 2023 at 3:53 PM Adrian <[email protected]> wrote:
>
>> Hi, thanks for your reply.
>>
>> I really appreciate your work in this library.
>>
>> I used /bin/time utility of Linux but I also saw the same result with 
>> another memory analyzer.
>>
>> As I mentioned, since the file could be big, my aim is to reduce memory 
>> usage when reading data from capnp database because it could be very big. 
>> When I read small portions of that database, I want my program not to 
>> consume so much memory. In the documentation, you refer to mmap usage to 
>> achieve this. Do you think that my approach is wrong for that purpose like 
>> I implemented in my code?
>>
>> Thanks
>>
>> On Wednesday, June 21, 2023 at 10:45:21 PM UTC+2 [email protected] 
>> wrote:
>>
>>> Hi Adrian,
>>>
>>> How are you measuring memory usage, exactly?
>>>
>>> When using mmap, measuring memory usage gets a bit complicated. The 
>>> kernel will load pages of the file into memory when you read then, and then 
>>> it is free to discard those pages at any time -- because it can always load 
>>> them again later if needed. But the kernel will only actually discard pages 
>>> if it needs the memory for something else. So if you read the entire file 
>>> by mmap-ing it and reading every page, and nothing else needs memory, then 
>>> all those pages will stay resident in memory. But this isn't really the 
>>> same as your program allocating memory, because, again, all those pages can 
>>> be freed up instantly whenever memory is needed.
>>>
>>> In order to fully understand what is going on you may have to dig into 
>>> more detailed memory stats. If your OS is just giving you a single number 
>>> for memory usage, it isn't telling the full story. Usually you can find a 
>>> bunch of different statistics if you dig in a little more.
>>>
>>> -Kenton
>>>
>>> On Wed, Jun 21, 2023 at 9:47 AM Adrian <[email protected]> wrote:
>>>
>>>> Hello
>>>>
>>>> I have been working on Cap'n Proto for some time to make some tests. My 
>>>> aim is to read the small chunks in a big serialized data to reduce the 
>>>> total memory consumption. For that purpose, I used memory-mapped reading 
>>>> and wrote a simple example to make some memory usage tests. 
>>>>
>>>> In the tests, I realized that even if I only read the small data chunk 
>>>> (address) only include "address" string in itself, the total memory usage 
>>>> of the below test program is 512 MB in my machine (the capnp database is 
>>>> 2.1GB). I am wondering where I am doing something wrong. Note: I run the 
>>>> program only "read" mode. I called the "write" once to create capnp 
>>>> database.
>>>>
>>>> If you have any opinion, I would be very happy if you share it with me.
>>>>
>>>> *Proto file*
>>>>
>>>> ----------------------------------------------------------------------------------------------
>>>> @0xa5af5d9c9e54c04a;
>>>>
>>>> struct Person {
>>>>   name @0 :Text;
>>>>   id @1 :UInt32;
>>>>   email @2 :Text;
>>>>   address @3 :Text;
>>>> }
>>>>
>>>> struct AddressBook {
>>>>   people @0 :List(Person);
>>>> }
>>>>
>>>> ----------------------------------------------------------------------------------------------
>>>>
>>>> Source code of example
>>>>
>>>> ----------------------------------------------------------------------------------------------
>>>>
>>> #include "test.capnp.h"
>>>> #include <capnp/message.h>
>>>> #include <capnp/serialize-packed.h>
>>>> #include <capnp/serialize.h>
>>>> #include <iostream>
>>>> #include <fcntl.h>
>>>> #include <sys/mman.h>
>>>> #include <sys/stat.h>
>>>> #include <unistd.h>
>>>> #include <stdlib.h>
>>>>
>>>> void writeAddressBook(int fd)
>>>> {
>>>> constexpr const size_t NodeNumber = 1024 * 8;
>>>>
>>>> ::capnp::MallocMessageBuilder message;
>>>>
>>>> AddressBook::Builder addressBook = message.initRoot<AddressBook>();
>>>> ::capnp::List<Person>::Builder people = addressBook.initPeople(
>>>> NodeNumber);
>>>>
>>>> // Each string will be 128KB.
>>>> constexpr const size_t size = 1024 * 128;
>>>>
>>>> for (int i = 0; i < NodeNumber; i++)
>>>> {
>>>> Person::Builder person = people[i];
>>>> person.setId(i);
>>>> person.setName(std::string(size, 'A').c_str());
>>>> person.setEmail(std::string(size, 'A').c_str());
>>>> person.setAddress("Address");
>>>> }
>>>>
>>>> kj::VectorOutputStream output;
>>>> writeMessage(output, message);
>>>>
>>>> auto serializedData = output.getArray();
>>>>
>>>> void *dataPtr = const_cast<void *>(static_cast<const void *>(
>>>> serializedData.begin()));
>>>> size_t dataSize = serializedData.size();
>>>>
>>>> size_t totalBytesWritten = 0;
>>>> while (totalBytesWritten < dataSize)
>>>> {
>>>> auto numberOfBytesWritten = write(fd, static_cast<const char *>(dataPtr) 
>>>> + totalBytesWritten, dataSize - totalBytesWritten);
>>>> if (numberOfBytesWritten == -1)
>>>> {
>>>> throw std::runtime_error{"Error during creating capnp database"};
>>>> }
>>>> totalBytesWritten += numberOfBytesWritten;
>>>> }
>>>> }
>>>>
>>>> void readAddressBook(int fd)
>>>> {
>>>> struct stat st;
>>>> fstat(fd, &st);
>>>> size_t fileSize = st.st_size;
>>>>
>>>> char *mappedData = static_cast<char *>(mmap(nullptr, fileSize, 
>>>> PROT_READ, MAP_PRIVATE, fd, 0));
>>>>
>>>> capnp::FlatArrayMessageReader reader(kj::ArrayPtr<const capnp::word>(
>>>> reinterpret_cast<const capnp::word *>(mappedData), fileSize / sizeof(
>>>> capnp::word)));
>>>>
>>>> AddressBook::Reader addressBook = reader.getRoot<AddressBook>();
>>>>
>>>> for (Person::Reader person : addressBook.getPeople())
>>>> {
>>>>
>>> person.getId();
>>>>
>>> }
>>>>
>>>> munmap(mappedData, fileSize);
>>>> close(fd);
>>>> }
>>>>
>>>> int main(int argc, char **argv)
>>>> {
>>>> int fd = open("./data.bin", O_RDWR);
>>>>
>>>> if (!std::strcmp(argv[1], "--write"))
>>>> {
>>>> writeAddressBook(fd);
>>>> }
>>>>
>>>> if (!std::strcmp(argv[1], "--read"))
>>>> {
>>>> readAddressBook(fd);
>>>> }
>>>>
>>>> return 0;
>>>> }
>>>>
>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Cap'n Proto" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/capnproto/a3192b90-a8bf-4151-84e8-0b8516d8f71bn%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/capnproto/a3192b90-a8bf-4151-84e8-0b8516d8f71bn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Cap'n Proto" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/capnproto/e4783119-6a58-47d9-954b-5a5ba205b671n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/capnproto/e4783119-6a58-47d9-954b-5a5ba205b671n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/56f277d1-fbfb-4fc3-97c4-6365b4ac5d49n%40googlegroups.com.

Reply via email to