(I posted this to https://bugzilla.mozilla.org/show_bug.cgi?id=558528#c46 as a permanent record, but I am posting this to a couple of newsgroups with a few typo fixes and added comments to solicit a solution re how to fix this performance issue from a wider audience.)
I have spotted a very conspicuous offender of not using buffered output where it can in comm-central thunderbird. My questions to the list is as follows. I am not familiar with I/O code within mozilla code base. Is there a way to set the buffer-mode of |m_fileStream->Write| inside CopyDate [see below] (just in case there are other wayward callers)? Or can we simply replace this |Write| with a buffered output version, but what that would be? I think as long as a buffer of reasonable size, even if it is 4KB or 16KB, is used, I/O performance will be much better and reasonably good. (especially for a remote file system, or imap case.) It does not have to be 128KB or much larger something, I think, although it would help for a local disk write and read. However, here I report that I have found a routine is copying message data WITHOUT ANY BUFFERING at all when it WRITES the output LINE BY LINE. A REAL PROBLEM: Imagine if the message has a few MB of binary attachment as mime data (PDF, JPEG photo, etc.). Typically, the attachment in the message is encoded as mime data, each line smaller than 80 octets. TB seems to send mime as 74 octets line including the ending. So, say, for a 2MB bytes of attachment, assuming that base64 increases the size of characters to be sent by 1.33, approximately 1.33*2MB / 72 (subtracting two from 74 for CRLF) = 38739 write system calls are issued by TB on the receiving side. 38739 |write| system calls. NO WONDER TB feels slow. The copying feels slow when my mail is stored on local disk. Imagine someone who has to use a remote file system in a corporate setting for mail store. He/she would be really penalized with so excessive # of write system calls with small chunk of data over the wire. These write calls ought to be merged by proper buffering! Even a small 4K buffer would cut down the # of system calls about 55 times, not too bad. Enough preamble for giving the background information. The culprit routine I found is NS_IMETHODIMP nsMsgLocalMailFolder::CopyData(nsIInputStream *aIStream, int32_t aLength) in the file comm-central/mailnews/local/src/nsLocalMailFolder.cpp And the problem unbuffered write in question is http://mxr.mozilla.org/comm-central/source/mailnews/local/src/nsLocalMailFolder.cpp#2111 2111 rv = mCopyState->m_fileStream->Write(start, lineLength, &bytesWritten); This does not seem to use buffering at all. (Please check the excerpt of system call trace from strace under linux attached in the following.). I agree that, since this function seems to be in need of scanning the data for "From" at the beginning, and escaping "From" at the beginning of line, etc., it needs to SCAN the INPUT line by line, but there is NO REASON AT ALL TO USE UNBUFFERED *OUTPUT*! Repeat-by: Monitor and record the system calls issued by TB. I used strace under linux. strace -o /tmp/t.out TB-BINARY Start TB, and then copy a message with reasonably large binary attachment in one folder to another folder. I found the repetition of small write(fd, bug, len) with length parameter 74 MANY times. E.g.: Excerpt from a typical strace record. I only excerpted the interesting lines. There are many other system calls. The particular message I copied had an attachment of 1.2MB size. The repeated writes() have the first parameter (file descriptor) as 54. I searched backward in trace file and found that FD 54 is for the target folder. (folder-C) ... open("/home/ishikawa/.thunderbird/u1ku1i3y.default/Mail/Local Folders/folder-C", O_RDWR|O_CREAT, 0664) = 54 ... Extraction of the initial part of the writes to this file descriptor: (I think I created the message using HTML format mail. That is why you see a few lines of HTML main message body. But that is not of any concern. Concern is the long repetition of mime data that follows.) ... write(54, "From - Sun Nov 22 11:35:01 2009\r"..., 33) = 33 clock_gettime(CLOCK_MONOTONIC, {2572825, 299989570}) = 0 clock_gettime(CLOCK_MONOTONIC, {2572825, 300253008}) = 0 write(54, "X-Mozilla-Status: 0001\r\n", 24) = 24 clock_gettime(CLOCK_MONOTONIC, {2572825, 299989570}) = 0 clock_gettime(CLOCK_MONOTONIC, {2572825, 300253008}) = 0 write(54, "X-Mozilla-Status2: 10000000\r\n", 29) = 29 .... I omit clock_gettime() below ... write(54, "X-Mozilla-Keys: $label3 "..., 97) = 97 write(54, "FCC: mailbox://nobody@Local%20Fo"..., 44) = 44 write(54, "X-Identity-Key: id1\r\n", 21) = 21 write(54, "Message-ID: <4B08A02E.7080201@ex"..., 44) = 44 write(54, "Date: Sun, 22 Nov 2009 11:35:01 "..., 39) = 39 write(54, "From: ishikawa <ishikawa@example"..., 39) = 39 write(54, "X-Mozilla-Draft-Info: internal/d"..., 70) = 70 write(54, "User-Agent: Thunderbird 2.0.0.23"..., 49) = 49 write(54, "MIME-Version: 1.0\r\n", 19) = 19 write(54, "To: nob...@example.com\r\n", 24) = 24 write(54, "Subject: libc-2.7.so 11:35\r\n", 28) = 28 write(54, "Content-Type: multipart/mixed;\r\n", 32) = 32 write(54, " boundary=\"------------000003080"..., 50) = 50 write(54, "\r\n", 2) = 2 write(54, "This is a multi-part message in "..., 46) = 46 write(54, "--------------000003080301040107"..., 40) = 40 write(54, "Content-Type: text/html; charset"..., 45) = 45 write(54, "Content-Transfer-Encoding: 7bit\r"..., 33) = 33 write(54, "\r\n", 2) = 2 write(54, "<!DOCTYPE html PUBLIC \"-//W3C//D"..., 65) = 65 write(54, "<html>\r\n", 8) = 8 write(54, "<head>\r\n", 8) = 8 write(54, "</head>\r\n", 9) = 9 write(54, "<body bgcolor=\"#ffffff\" text=\"#0"..., 41) = 41 write(54, "<br>\r\n", 6) = 6 write(54, "<br>\r\n", 6) = 6 write(54, "</body>\r\n", 9) = 9 write(54, "</html>\r\n", 9) = 9 write(54, "\r\n", 2) = 2 write(54, "--------------000003080301040107"..., 40) = 40 write(54, "Content-Type: application/octet-"..., 41) = 41 write(54, " name=\"libc-2.7.so\"\r\n", 21) = 21 write(54, "Content-Transfer-Encoding: base6"..., 35) = 35 write(54, "Content-Disposition: attachment;"..., 34) = 34 write(54, " filename=\"libc-2.7.so\"\r\n", 25) = 25 write(54, "\r\n", 2) = 2 write(54, "f0VMRgEBAQAAAAAAAAAAAAMAAwABAAAA"..., 74) = 74 write(54, "AAA0AAAANAAAADQAAABAAQAAQAEAAAUA"..., 74) = 74 write(54, "BAAAAAEAAAABAAAAAAAAAAAAAAAAAAAA"..., 74) = 74 write(54, "EwCcJwAAcFQAAAYAAAAAEAAAAgAAAJyd"..., 74) = 74 ... MANY lines of write with 74 octets.... write(54, "--------------000003080301040107"..., 42) = 42 write(54, "\r\n", 2) = 2 close(54) = 0 Observation: For this message with 1.2MB attachment, I counted 24016 write calls. (fgrep "write(54," /tmp/t.out | wc) Granted that the routine seems to be doing something about seeing "From", escaping "From" with ">From", etc., but it has *NO* reason to use UNBUFFERED WRITE here. It can buffer the output and then finally at the end of the loop it can flush or something. HOW TO FIX? I am not familiar with I/O routine inside mozilla code base. I am not sure where this file stream is opened [See the stack backtrace at the end.] and why it is not using buffered mode. rv = mCopyState->m_fileStream->Write(start, lineLength, &bytesWritten); Is there a way to set the buffer-mode to m_fileStream inside CopyData (just in case there are multiple wayward callers)? Or can we simply replace this Write with a buffered output version, but what that function would be? [The stack trace is shown below for those who want to chase what is calling this function.] The above is for copying an existing message to another folder. (So it can happen offline.) When I look up CopyData in mxr database, I found another instance of CopyData (for imap) and that seems to have the same unbuffered output issue. (Or at least superficially the code looks the same. Unless "->Write" semantics is different (i.e., unbuffered vs buffered) we have the crawling I/O performance when we copy / download large message (when we write to a local folder) when imap is used, too! MXR: E.g. Other place(s) where CopyData was found: I searched the identifier CopyData using mxr: MXR is slightly buggy, I think: I only left mail/mailnews-related entries. Defined as a function in: mailnews/imap/src/nsImapMailFolder.cpp (View Hg log or Hg annotations) line 3343, as member of class nsImapMailFolder -- NS_IMETHODIMP nsImapMailFolder::CopyData(nsIInputStream *aIStream, int32_t aLength) This function, CopyData calls another function CopyDataToOutputStreamForAppend() and in it, we have 3303 { 3304 rv = outputStream->Write(start, 3305 end-start, 3306 &writeCount); 3307 rv = outputStream->Write(CRLF, 2, &writeCount); 3308 } We should make sure that these writes are buffered! (and flush at the end of the loop/return.) From what I read before about the slowness of I/O in a comment posted by IMAP user, probably these are NOT buffered, either. AND, we SHOULD CHECK the return value rv: there is NO ERROR checking here !? No wonder there are reports of IMAP users losing messages, etc. in the past? [Added comment: even if these UNBUFFERED writes failed, close will succeed! So checking here is very important! ] Defined as a function prototype in: mailnews/base/public/nsICopyMessageListener.idl (View Hg log or Hg annotations) line 20 -- void copyData(in nsIInputStream aIStream, in long aLength); Defined as a variable in: mailnews/local/src/nsLocalMailFolder.cpp (View Hg log or Hg annotations) line 1830 -- rv = CopyData(inputStream, (int32_t) fileSize); line 2026 -- NS_IMETHODIMP nsMsgLocalMailFolder::CopyData(nsIInputStream *aIStream, int32_t aLength) This is the function I found out about and report here. Again, the questions I have are >Is there a way to set the buffer-mode to m_fileStream inside CopyDate > (just in case there are other wayward callers)? >Or can we simply replace this Write with a buffered output version? See Related bugs: Bug 558528 - Larger buffer size in file I/O by Mail&News to improve performance on slower drives (eg network drives/shares). Use of Necko's buffer size, buffer size user can easily customize. Well, I have not used the aceman's patch in there yet since I wanted to find out where the offending code for calling write() repeatedly with smallish 74 octet length originated. I noticed this repeated write() of smallish size earlier this year when someone reported a serious issue when his mail store is on a remote file system (CIFS) and he seems to have used imap. I tried to re-create his problem locally by mounting Windows 7 file system from my local linux installation and in so doing noticed a few issues with CIFS code itself during a transient network error (which seems to have been fixed nicely), and a missing error code or two on TB's side, and in so doing noticed a trace of MANY write system calls with short (70+ octets) on wire. Such many small write calls over wire can kill performance easily, and the missing error handling was bad :-( BTW, :aceman's patch there seems to handle many "read" side of the issue. The problem I am reporting is "write" side. Bug 494238 - (jetpack-panel-apps) ability for a jetpack to open a panel I wanted to use the great profiler ( see https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Thunderbird_Performance_Problem_with_G ). But it never seemed to work for me when I tried to use it against C-C TB until lately. But finally, thanks to https://bugzilla.mozilla.org/show_bug.cgi?id=494238#c47 it seems to work! (But I have not yet used that profiler for this particular performance issue because it has been only a day since I could get it working after a long fiasco with bug regarding loadinfo, Bug 1111304 - assertion failure loadinfo, this month. ) I wanted to get out the words fast so that something can be done during the holidays for this serious performance issue. But once I become familiar with the profiler and can offer performance figures using the profiler, I will. It looks the profiler could have saved me in my efforts to capture the traceback mentioned below. I see a traceback in the profile window somehow. This is the stack trace of C-C TB when the write with this small chunk is invoked. (gdb) where #0 write () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ffff7fb8740 in pt_Write (fd=<optimized out>, buf=<optimized out>, amount=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/nsprpub/pr/src/pthreads/ptio.c:1315 #2 0x00007ffff04beccc in (anonymous namespace)::interposedWrite ( aFd=<optimized out>, aBuf=<optimized out>, aAmt=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/build/NSPRInterposer.cpp:68 #3 0x00007ffff7f9f0b9 in PR_Write (fd=<optimized out>, buf=<optimized out>, amount=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/nsprpub/pr/src/io/priometh.c:114 #4 0x00007ffff009ce03 in nsMsgFileStream::Write (this=<optimized out>, buf=<optimized out>, count=<optimized out>, result=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/base/util/nsMsgFileStream.cpp:154 #5 0x00007ffff02ace8d in nsMsgLocalMailFolder::CopyData ( this=<optimized out>, aIStream=<optimized out>, aLength=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsLocalMailFolder.cpp:2111 #6 0x00007fffeffa089f in nsCopyMessageStreamListener::OnDataAvailable ( this=<optimized out>, ctxt=<optimized out>, aIStream=<optimized out>, sourceOffset=<optimized out>, aLength=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/base/src/nsCopyMessageStreamListener.cpp:86 #7 0x00007ffff02c8a48 in nsMailboxProtocol::ReadMessageResponse ( this=<optimized out>, inputStream=<optimized out>, sourceOffset=<optimized out>, length=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsMailboxProtocol.cpp:594 #8 0x00007ffff02c8f0b in nsMailboxProtocol::ProcessProtocolState ( this=<optimized out>, url=<optimized out>, inputStream=<optimized out>, offset=<optimized out>, length=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsMailboxProtocol.cpp:679 #9 0x00007ffff00c1399 in nsMsgProtocol::OnDataAvailable ( this=<optimized out>, request=<optimized out>, ctxt=<optimized out>, inStr=<optimized out>, sourceOffset=<optimized out>, count=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/base/util/nsMsgProtocol.cpp:353 #10 0x00007ffff0572bfe in nsInputStreamPump::OnStateTransfer ( this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/netwerk/base/src/nsInputStreamPump.cpp:609 #11 0x00007ffff0573f5d in nsInputStreamPump::OnInputStreamReady ( this=<optimized out>, stream=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/netwerk/base/src/nsInputStreamPump.cpp:436 #12 0x00007ffff04751d8 in nsInputStreamReadyEvent::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/io/nsStreamUtils.cpp:88 #13 0x00007ffff049cf6d in nsThread::ProcessNextEvent (this=<optimized out>, aMayWait=<optimized out>, aResult=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/threads/nsThread.cpp:855 #14 0x00007ffff04cfba3 in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/glue/nsThreadUtils.cpp:265 #15 0x00007ffff08c4356 in mozilla::ipc::MessagePump::Run ( this=<optimized out>, aDelegate=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/glue/MessagePump.cpp:99 #16 0x00007ffff0885029 in MessageLoop::RunInternal (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/chromium/src/base/message_loop.cc:233 #17 0x00007ffff08853fc in RunHandler (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/chromium/src/base/message_loop.cc:226 #18 MessageLoop::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/chromium/src/base/message_loop.cc:200 #19 0x00007ffff28827b3 in nsBaseAppShell::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/widget/nsBaseAppShell.cpp:164 #20 0x00007ffff33d61f9 in nsAppStartup::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/components/startup/nsAppStartup.cpp:281 #21 0x00007ffff345e1ee in XREMain::XRE_mainRun (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4153 #22 0x00007ffff345f9ab in XREMain::XRE_main (this=<optimized out>, argc=<optimized out>, argv=<optimized out>, aAppData=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4229 #23 0x00007ffff345fd7d in XRE_main (argc=<optimized out>, argv=<optimized out>, aAppData=<optimized out>, aFlags=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4449 #24 0x000000000040378e in do_main (argc=argc@entry=1, argv=argv@entry=0x7fffffffd9f8, xreDirectory=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mail/app/nsMailApp.cpp:195 #25 0x00000000004038b5 in main (argc=1, argv=0x7fffffffd9f8) at /REF-COMM-CENTRAL/comm-central/mail/app/nsMailApp.cpp:380 (gdb) info register Yeah, right. I had to set a breakpoint to "write" system call and checked the registers (this is under Debian GNU/Linux 64-bit) and when the passed length parameter is 74 in, I hit the breakpoint and voila. It took me a while to figure out gdb's esoteric command expression syntax, and I have to skip the major part of the execution without setting up the break point (because with the conditional breakpoint, the execution speed is much slower [maybe error message written to stderr causes one |write| call per character and thus slows down TB execution very much], and hitting control-C to revert back to gdb prompt when TB is finally waiting for user interaction after showing the three window panes, and THEN set the conditional breakpoint break write if ($rdx == 74) to gdb prompt and continue, then hit the breakpoint to obtain the above stack trace. The great profiler I mentioned above seems to record the repeated write nicely (since it is sampled many times, and at the sampled time, the profiler seems to record the stack trace, and so it would have been much easier for me to figure out which higher level routine invokes short writes) TIA _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform