On May 23, 2005 at 22:52, East Coast Coder wrote: > After much thought, it seems my needs of 1) ThreadID's and 2) High > performance, high capacity archives call for use of a regular SQL > database, rather than messages.db. > > I'd still like to use Mhonarc to parse the mbox, and to parse the > individual messages and convert them to HTML. Just I'd like to be > able to hook it up to a SQL database (at least for the threadid, > subject, and references - possibly for the to: and from: also).
The least intrusive way is to utilize MHonArc's callback API to get the information you need to load into your custom database. For example, you can set the $mhonarc::CBMessageHeadRead callback and extract the information you want to load into your database. The API is described in the documenation in one of the appendix section. Some problems may arise depending on what you ultimately want to do. For example, data in the .mhonarc.db file is needed for numerous resource variables expansion. Therefore, if you disable the writing of the .mhonarc.db file, many resource variables will not work. Depending on your needs, you either customize page layout resource to not require them (which means you would have to generate your own index pages and nav links) or register a resource variable callback (see API docs) to handle the expansion of resource variables (which is probably a cumbersome task). I have put some recent thought in what it would take to replace the flat-file database file with something like Berkeley DB, but such effort ripples through much of mhonarc's code base. Berkeley DB will allow for scability, but much code will have to be changed (I think doing tie tricks will not be sufficient). If you want to minimize work, you could take the following approach: * Use the callback API to load key information into your SQL database, as noted above. * Use period-based archives (e.g. monthly) to keep archive updates manageable. This is how mharc works, and it is what other users have done to make their archives scalable. The period archives are sufficient for date-based navigation since the boundaries do not matter (mharc uses a simple CGI program to provide nav links between periods). * Customize page layout resources that provide navigational links based on your SQL data. Since threading is the big issue, you can do something like the following: - Disable thread index generation in mhonarc. Mainly because threads will be "broken" at period boundaries. This is doable via a resource setting. - Remove mhonarc's built-in thread nav links in message pages (same reason as previous item). This is doable via resource settings. - Create your own custom thread indexes based upon your database data. These indexes may be generated via CGI/dynamic programs that query your database at runtime. - Create your own thread nav links in messages pages based upon your own database data. You can modify the page layout resource to include markup to a CGI (or similiar) URL that determines next and previous items by thread and/or view entire thread summary. A subtle technical issue is resolving your database data to the correct message file. I.e. In your threading code, you need to know how to map to the correct filename/URL, and your code needs to deal with the fact that message files may span multiple directories (due to the period-based archive layout). In your callback, you can use the $mhonarc::OUTDIR variable to get the path specified to mhonarc when invoked. Therefore, when calling mhonarc, make sure OUTDIR is set to a value that you can map to valid URLs. As for the base filename of the message, you will need to the message number mhonarc will assign the message (remember, message file names are based upon the message number). Unfortunately, the message number is not explicitly provided to you when $mhonarc::CBMessageHeadRead is invoked. To get the assigned message number, you can use the following in your callback routine: $msg_num = $mhonarc::LastMsgNum+1; Assuming you are using the default resource values for message prefix and suffix, you can get the filename with: $msg_base_filename = sprintf('msg%05d.html', $msg_num); The above approach does not remove the use of the .mhonarc.db files, but it should deal with the scalability problem along with addressing the functionality you desire. If the above approach is not sufficient for your needs, then a more detailed technical discussion is required, along with the consideration of a major redesign/upgrade of the mhonarc code base. --ewh --------------------------------------------------------------------- To sign-off this list, send email to [EMAIL PROTECTED] with the message text UNSUBSCRIBE MHONARC-DEV