Hi Kenton Thanks so much for your detailed answer.
On Wednesday, June 21, 2023 at 11:32:39 PM UTC+2 [email protected] wrote: > Hi Adrian, > > The memory usage you are seeing happens whether or not you use mmap, it's > just accounted differently. If you read the file using many small read() > calls, the operating system will still load all of the pages of the file > into memory, and will only remove them from memory when the memory is used > for something else. That's called caching. But when you use read(), the > memory isn't attached directly to your program, it's just in kernel space, > so it doesn't look like your program is using a lot of memory, even though > it is. > > But using memory this way is not really consuming it. The memory is still > available for anything else that needs it. Since the memory is still > available, it's incorrect to think of it the same as memory your program > allocated for private use. > > Put simply, your program is not using the memory you think it is. You need > to understand what the numbers actually mean. > > -Kenton > > On Wed, Jun 21, 2023 at 3:53 PM Adrian <[email protected]> wrote: > >> Hi, thanks for your reply. >> >> I really appreciate your work in this library. >> >> I used /bin/time utility of Linux but I also saw the same result with >> another memory analyzer. >> >> As I mentioned, since the file could be big, my aim is to reduce memory >> usage when reading data from capnp database because it could be very big. >> When I read small portions of that database, I want my program not to >> consume so much memory. In the documentation, you refer to mmap usage to >> achieve this. Do you think that my approach is wrong for that purpose like >> I implemented in my code? >> >> Thanks >> >> On Wednesday, June 21, 2023 at 10:45:21 PM UTC+2 [email protected] >> wrote: >> >>> Hi Adrian, >>> >>> How are you measuring memory usage, exactly? >>> >>> When using mmap, measuring memory usage gets a bit complicated. The >>> kernel will load pages of the file into memory when you read then, and then >>> it is free to discard those pages at any time -- because it can always load >>> them again later if needed. But the kernel will only actually discard pages >>> if it needs the memory for something else. So if you read the entire file >>> by mmap-ing it and reading every page, and nothing else needs memory, then >>> all those pages will stay resident in memory. But this isn't really the >>> same as your program allocating memory, because, again, all those pages can >>> be freed up instantly whenever memory is needed. >>> >>> In order to fully understand what is going on you may have to dig into >>> more detailed memory stats. If your OS is just giving you a single number >>> for memory usage, it isn't telling the full story. Usually you can find a >>> bunch of different statistics if you dig in a little more. >>> >>> -Kenton >>> >>> On Wed, Jun 21, 2023 at 9:47 AM Adrian <[email protected]> wrote: >>> >>>> Hello >>>> >>>> I have been working on Cap'n Proto for some time to make some tests. My >>>> aim is to read the small chunks in a big serialized data to reduce the >>>> total memory consumption. For that purpose, I used memory-mapped reading >>>> and wrote a simple example to make some memory usage tests. >>>> >>>> In the tests, I realized that even if I only read the small data chunk >>>> (address) only include "address" string in itself, the total memory usage >>>> of the below test program is 512 MB in my machine (the capnp database is >>>> 2.1GB). I am wondering where I am doing something wrong. Note: I run the >>>> program only "read" mode. I called the "write" once to create capnp >>>> database. >>>> >>>> If you have any opinion, I would be very happy if you share it with me. >>>> >>>> *Proto file* >>>> >>>> ---------------------------------------------------------------------------------------------- >>>> @0xa5af5d9c9e54c04a; >>>> >>>> struct Person { >>>> name @0 :Text; >>>> id @1 :UInt32; >>>> email @2 :Text; >>>> address @3 :Text; >>>> } >>>> >>>> struct AddressBook { >>>> people @0 :List(Person); >>>> } >>>> >>>> ---------------------------------------------------------------------------------------------- >>>> >>>> Source code of example >>>> >>>> ---------------------------------------------------------------------------------------------- >>>> >>> #include "test.capnp.h" >>>> #include <capnp/message.h> >>>> #include <capnp/serialize-packed.h> >>>> #include <capnp/serialize.h> >>>> #include <iostream> >>>> #include <fcntl.h> >>>> #include <sys/mman.h> >>>> #include <sys/stat.h> >>>> #include <unistd.h> >>>> #include <stdlib.h> >>>> >>>> void writeAddressBook(int fd) >>>> { >>>> constexpr const size_t NodeNumber = 1024 * 8; >>>> >>>> ::capnp::MallocMessageBuilder message; >>>> >>>> AddressBook::Builder addressBook = message.initRoot<AddressBook>(); >>>> ::capnp::List<Person>::Builder people = addressBook.initPeople( >>>> NodeNumber); >>>> >>>> // Each string will be 128KB. >>>> constexpr const size_t size = 1024 * 128; >>>> >>>> for (int i = 0; i < NodeNumber; i++) >>>> { >>>> Person::Builder person = people[i]; >>>> person.setId(i); >>>> person.setName(std::string(size, 'A').c_str()); >>>> person.setEmail(std::string(size, 'A').c_str()); >>>> person.setAddress("Address"); >>>> } >>>> >>>> kj::VectorOutputStream output; >>>> writeMessage(output, message); >>>> >>>> auto serializedData = output.getArray(); >>>> >>>> void *dataPtr = const_cast<void *>(static_cast<const void *>( >>>> serializedData.begin())); >>>> size_t dataSize = serializedData.size(); >>>> >>>> size_t totalBytesWritten = 0; >>>> while (totalBytesWritten < dataSize) >>>> { >>>> auto numberOfBytesWritten = write(fd, static_cast<const char *>(dataPtr) >>>> + totalBytesWritten, dataSize - totalBytesWritten); >>>> if (numberOfBytesWritten == -1) >>>> { >>>> throw std::runtime_error{"Error during creating capnp database"}; >>>> } >>>> totalBytesWritten += numberOfBytesWritten; >>>> } >>>> } >>>> >>>> void readAddressBook(int fd) >>>> { >>>> struct stat st; >>>> fstat(fd, &st); >>>> size_t fileSize = st.st_size; >>>> >>>> char *mappedData = static_cast<char *>(mmap(nullptr, fileSize, >>>> PROT_READ, MAP_PRIVATE, fd, 0)); >>>> >>>> capnp::FlatArrayMessageReader reader(kj::ArrayPtr<const capnp::word>( >>>> reinterpret_cast<const capnp::word *>(mappedData), fileSize / sizeof( >>>> capnp::word))); >>>> >>>> AddressBook::Reader addressBook = reader.getRoot<AddressBook>(); >>>> >>>> for (Person::Reader person : addressBook.getPeople()) >>>> { >>>> >>> person.getId(); >>>> >>> } >>>> >>>> munmap(mappedData, fileSize); >>>> close(fd); >>>> } >>>> >>>> int main(int argc, char **argv) >>>> { >>>> int fd = open("./data.bin", O_RDWR); >>>> >>>> if (!std::strcmp(argv[1], "--write")) >>>> { >>>> writeAddressBook(fd); >>>> } >>>> >>>> if (!std::strcmp(argv[1], "--read")) >>>> { >>>> readAddressBook(fd); >>>> } >>>> >>>> return 0; >>>> } >>>> >>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Cap'n Proto" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/capnproto/a3192b90-a8bf-4151-84e8-0b8516d8f71bn%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/capnproto/a3192b90-a8bf-4151-84e8-0b8516d8f71bn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "Cap'n Proto" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/capnproto/e4783119-6a58-47d9-954b-5a5ba205b671n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/capnproto/e4783119-6a58-47d9-954b-5a5ba205b671n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "Cap'n Proto" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/56f277d1-fbfb-4fc3-97c4-6365b4ac5d49n%40googlegroups.com.
