(I'm back from out-of-town and catching up on email) On April 12, 2007 at 20:34, "Jeff Breidenbach" wrote:
> Does this look reasonable to people? Anything obviously > weird? > Total Elapsed Time = 7.334524 Seconds > User+System Time = 3.794524 Seconds > Exclusive Times > %Time ExclSec CumulS #Calls sec/call Csec/c Name > 20.3 0.773 1.455 7 0.1104 0.2079 mhonarc::sort_messages Sorting does not surprise me. MHonArc does not keep a persistent sorted data structure, so it resorts everytime new messages are added (under the assumption that messages may come in in arbitrary order). This can definitely be painful if one updates an archive on-the-fly versus doing a queuing-batch model. In the latter, multiple messages may be added in a single invocations, avoiding the resorting for each message added. Do you invoke mhonarc for each new message for a list or do you queue up messages for a given list (over a specified period) before invoking mhonarc for the list? Note, sorting includes thread sorting, which is the most complicated. Some speed increase may be possible by disabling SUBJECTTHREADS (this is mentioned in the Performance Tips doc). However, disabling SUBJECTTHREADS may have a usability impact for messages that fail to define the proper reference headers. For large scale usage, a (robust) persistent data structure is needed. However, such a structure would require a redesign of mhonarc internals. > 18.6 0.707 0.707 446811 0.0000 0.0000 mhonarc::get_time_from_index This is due to the Perl 4 legacy code base. The unique index for each message also contains the date-time stamp applicable for the message. It may be possible to add in a new hash to just maintain the date-time information to avoid the split() operation each time get_time_from_index is invoked. This will cause an increase in the database size (and in memory size), but it may be negligable in the grand-scheme of things. I think when mhonarc was first written (and it was not called mhonarc), I favored reducing the numbering of hashes used versus performance gains (since performance was not a real issue since I did not forsee mhonarc being used at such a large scale). > 14.7 0.558 0.558 4805 0.0001 0.0001 MHonArc::RFC822::tokenise This code is non-trivial since it does full RFC-822 parsing. Older versions of mhonarc used to use a more simple parsing routine, but a more robust routine was required as mhonarc evolved (and to address bugs in email name add address extraction). > 14.4 0.548 2.264 13800 0.0000 0.0002 mhonarc::replace_li_var Minimizing variable usage in resource files is the main way to reduce the calls to this routine. However, resource file maintenance concerns may trump any performance hit gained. > 5.09 0.193 0.193 13037 0.0000 0.0000 mhonarc::compute_msg_pos This is part of resource variable resolution. See <http://www.mhonarc.org/MHonArc/doc/guides/performance.html#mesg_spec> on how to minimize the performance impact of this routine. > 4.77 0.181 0.561 9538 0.0000 0.0001 MHonArc::UTF8::Encode::clip This actually is more efficient than using the default CHARSETCONVERTERS model. I.e. Encoding everything to UTF-8 is more efficient (assuming proper resource settings). In MHonArc's default configuration, charset conversion can be very costly when dealing with non-ASCII messages. Years ago, I discovered this when doing my own profiling tests on MHonArc when performance complaints were raised when more extensive charset routines were added. > 4.48 0.170 0.319 1 0.1700 0.3193 mhonarc::get_resources This loads in the resource file(s). --ewh