A NOTE has been added to this issue. ====================================================================== http://www.dbmail.org/mantis/view.php?id=139 ====================================================================== Reported By: aaron Assigned To: paul ====================================================================== Project: DBMail Issue ID: 139 Category: IMAP daemon Reproducibility: always Severity: feature Priority: normal Status: acknowledged ====================================================================== Date Submitted: 12-Dec-04 00:27 CET Last Modified: 31-May-06 14:36 CEST ====================================================================== Summary: dbmail-imapd doesn't scale nicely with large message ranges Description: Thomas Mueller wrote:
Sometimes my server uses for some minutes much more memory than it should - and I guess it's dbmail. I hope I'll find some time soon to use a profiler, but meanwhile I guess the following happens: someone marks a mailbox for offline use and dbmail-imapd does the following: - fetch all mails from database - keep the result set in memory - deliver them The third step can take a while so the process eats lots of memory for quite some time - no bug, its a design problem. This only happens for some minutes, that's why I'm quite sure it's no memory hole. The way to go would be to use a server side cursor so only one mail has to be kept in memory - but AFAIK there's a storage system with SQL interface (sorry couldn't resist) that doesn't support cursors. ====================================================================== ---------------------------------------------------------------------- aaron - 12-Dec-04 00:30 ---------------------------------------------------------------------- Paul, you might know this code best, is there someplace in _ic_foo() that goes through a result list from the database and builds some huge thing in memory, and then begins to send it back to the client? It might be as simple as placing some ic_write()'s in the middle of that loop, rather than building up the whole structure at all. ---------------------------------------------------------------------- paul - 12-Dec-04 09:34 ---------------------------------------------------------------------- There is basically only one candidate: _ic_fetch Thomas was referring to the use-case where users mark a mailbox for offline usage. This probably triggers something like: C: A001 UID FETCH 1:* (FULL) This will first retrieve the full range of message_idnr with their flags using db_get_msginfo_range, and after that start retrieving the full messages one-by-one and dumping them to the client. There is no place in the code where messageblks for more than one message at a time are selected. There should be, once we can support cursors, but for now dbmail is on a one-message-at-a-time paradigm. So, db_get_msginfo_range will build a large result-set that should scale well since it's holding only the message_idnr and the flags, and afterwards messageblks for these messages are selected, one message at a time. It could be this long loop that retrieves the full messages is 'leaking' memory during its run: rescaling the memory allocated for the cache to the largest message retrieved, and not releasing that memory until the end of the loop. ---------------------------------------------------------------------- aaron - 24-Mar-06 12:25 ---------------------------------------------------------------------- How does this look in SVN trunk? I haven't tried it with any huge mailboxes myself... ---------------------------------------------------------------------- sayler - 24-Mar-06 16:59 ---------------------------------------------------------------------- I'll see if I can produce a test case against my 90kmessage Mbox. Generally, I've not noticed dbmail using lots of memory (but I'm running it on a server with 2GB so it's possible I'm just not noticing). Thomas -- if you know which mail client you're using it would help. I can try Thunderbird here easily. ---------------------------------------------------------------------- sayler - 24-Mar-06 17:58 ---------------------------------------------------------------------- Paul, Heap usage seems to grow pretty badly, and somewhat linearily, when doing a FETCH 1:* (FULL). The core size reported by top grows from 2mb on program load to around 30-35mb for FETCH 1:1000 (FULL) on my mailbox. I had it up to about 105mb on a FETCH 1:* before I killed it (I think it was between 2 and 3000 messages into the run). I'm learning to play with valgrind and friends now.. I'll let you know if I see anything interesting. ---------------------------------------------------------------------- sayler - 24-Mar-06 19:59 ---------------------------------------------------------------------- OK, if I understand this right memory is being allocated by _set_content_from_stream (in g_mime_stream_write_string) and then never free'd. I'm not sure if this is a bug in g_mime or the way we use it (or a misunderstanding on my part). The attached valgrind/massif plots are from me doing: 1 LOGIN XXX YYY 2 SELECT INBOX 3 FETCH 1:700 (FULL) 4 LOGOUT against a 90kmessage INBOX. After fetching 700 messages, we have around 10mb of heap allocated by g_mime_stream_write_string. I *think* the rest of the usage is legit.. Anyone (Paul, Aaron?) care to comment? I don't understand Gmime and our usage of it well enough to dig very very (yet) ---------------------------------------------------------------------- aaron - 24-Mar-06 22:09 ---------------------------------------------------------------------- I don't understand how your suggested callchain works. Here's what I see in the code: lmtp.c/main.c dbmail_message_new_from_stream dbmail_message_init_with_stream _set_content_from_stream OK, must not be this one, this is delivery... dbmail-imapsession.c, dbmail-mailbox.c, dbmail-message.c call: db_init_fetch dbmail_message_retrieve _fetch_head/_fetch_full _retrieve dbmail_message_init_with_string _set_content _set_content_from_stream OK, here we go. Something going wrong here... Found it: char *buf = g_new0... buf only gets freed in case DBMAIL_STREAM_LMTP and DBMAIL_STREAM_PIPE, but not from default or case DBMAIL_STREAM_RAW. Ok, try the latest SVN. I just wrapped buf inside an anon block only in the part of the switch block where it gets used. ---------------------------------------------------------------------- sayler - 24-Mar-06 23:43 ---------------------------------------------------------------------- no dice. I still get a big chunk of stuff allocated.. Here's my (partial) theory after pouring over the code today: The problem doesn't happen (e.g. memory usage is nice and flat) if I do a FETCH (INTERNALDATE) or even FETCH (RFC822.HEADER) We get massive bloat because we're retaining the whole body after parsing the BODYSTRUCTURE (which is needed by FULL) As far as I can tell, the imap_cache is correctly freeing all its resources after the new message is swapped in every iteration of the fetch loop (when the message id of self and the cache mismatch). However, valgrind seems to think that the memory from g_mime_stream_write_string is never reclaimed. g_mime_stream_write_string is used to convert the content as a GString into a GMIME object. The weird thing is as far as I can tell everything is being cleaned up properly. If I add up the outputs of all the "dbmail-imapsession.c,_imap_cache_update: cache size [XXX]" lines in my log file it seems to be about the amount of memory leaked in g_mime_stream_write_string Any thoughts? ---------------------------------------------------------------------- aaron - 25-Mar-06 10:37 ---------------------------------------------------------------------- Is it only valgrind complaining, or are you also seeing this reflected in the VIRT/RES/SHR sizes? I recall Paul saying that Valgrind has some issues keeping track of Glib... ---------------------------------------------------------------------- sayler - 25-Mar-06 11:54 ---------------------------------------------------------------------- See comment 0001057, yes. I see size reported by top growing linearily as fetch proceeds. After FETCH FULL of a couple thousand headers core size has grown to >100mb. ---------------------------------------------------------------------- paul - 25-Mar-06 17:58 ---------------------------------------------------------------------- I'm expanding the test suite in check_dbmail_imapd so I can get a handle on this. Hang in there... ---------------------------------------------------------------------- maenaka - 14-Apr-06 04:41 ---------------------------------------------------------------------- Hi there. patch-dbmail-message.c will reduce `some minutes'. It doesn't fix `much more memory than it should' issue though. ---------------------------------------------------------------------- aaron - 30-Apr-06 09:25 ---------------------------------------------------------------------- Paul do you have any feelings about switching from GMIME_STREAM_BUFFER_CACHE_READ to GMIME_STREAM_BUFFER_BLOCK_READ, per maenaka's patch? ---------------------------------------------------------------------- paul - 30-Apr-06 17:22 ---------------------------------------------------------------------- It passes all the tests. I chose CACHE_READ instead of BLOCK_READ because the latter is for types that support seek. I don't think that is much of an issue in this case though. If BLOCK_READ is faster, lets go with that one for now. Also, I wonder if this issue is not glib's and possibly even solved by the new slab allocator used in libglib-2.10.0. ---------------------------------------------------------------------- ryo - 23-May-06 12:48 ---------------------------------------------------------------------- I cannot update to dbmail 2.1.6 from dbmail 2.1.2 because of this memory leak problem. This problem does not occur in dbmail 2.1.2. I was able to know continuance of stack trace in comment 0001059 by using gdb and pmap command. Please see the following. dbmail-message.c:_set_content_from_stream gmime-stream.c:g_mime_stream_write_string gmime-stream.c:g_mime_stream_write gmime-stream-mem.c:stream_write garray.c:g_byte_array_set_size garray.c:g_array_set_size garray.c:g_array_maybe_expand gmem.c:g_realloc The new memory block is created at the following line in gmem.c:g_realloc and this memory block is not freed even if imap session is closed. mem = glib_mem_vtable.realloc (mem, n_bytes); I do not know how to free this memory block. Any idea? ---------------------------------------------------------------------- ryo - 31-May-06 14:36 ---------------------------------------------------------------------- It seems that the following functions needs g_object_unref() for return value. - g_mime_message_get_mime_part() - g_mime_message_part_get_message() - g_mime_multipart_get_part() I think that the cause of this memory leak problem is forgetting to do this. For example, the source code of g_mime_message_get_mime_part() function is as follows. GMimeObject * g_mime_message_get_mime_part (GMimeMessage *message) { g_return_val_if_fail (GMIME_IS_MESSAGE (message), NULL); if (message->mime_part == NULL) return NULL; g_object_ref (message->mime_part); return message->mime_part; } This function do g_object_ref for message->mime_part, so I think need to do g_object_unref after. I made the patch. Please see the attached file: dbmail-unref.patch It seems that this patch resolves this memory leak problem. But since I am not detailed gmime, I am not confident. Is this patch correct? Issue History Date Modified Username Field Change ====================================================================== 12-Dec-04 00:27 aaron New Issue 12-Dec-04 00:30 aaron Note Added: 0000439 12-Dec-04 09:34 paul Note Added: 0000441 22-Aug-05 10:29 paul Assigned To => paul 22-Aug-05 10:29 paul Status new => acknowledged 22-Aug-05 10:29 paul Projection none => redesign 22-Aug-05 10:29 paul ETA none => > 1 month 24-Mar-06 12:25 aaron Note Added: 0001051 24-Mar-06 16:59 sayler Note Added: 0001055 24-Mar-06 17:58 sayler Note Added: 0001057 24-Mar-06 19:52 sayler File Added: massif.27855.pdf 24-Mar-06 19:52 sayler File Added: massif.27855.txt 24-Mar-06 19:59 sayler Note Added: 0001058 24-Mar-06 22:09 aaron Note Added: 0001059 24-Mar-06 23:43 sayler Note Added: 0001060 25-Mar-06 10:37 aaron Note Added: 0001061 25-Mar-06 11:54 sayler Note Added: 0001064 25-Mar-06 17:58 paul Note Added: 0001066 14-Apr-06 04:37 maenaka File Added: patch-dbmail-message.c 14-Apr-06 04:41 maenaka Note Added: 0001082 30-Apr-06 09:25 aaron Note Added: 0001132 30-Apr-06 17:22 paul Note Added: 0001141 23-May-06 12:48 ryo Note Added: 0001191 31-May-06 14:36 ryo Note Added: 0001203 ======================================================================