At 10:47 PM -0500 2003/10/29, Barry Warsaw wrote:

 I'm not sure if Andrew Koenig is on this list, but he described an
 algorithm he developed to quickly find messages in an mbox file.  If
 he's here, maybe he can talk about it.

7th edition mbox files are a pain. There are other mailbox file formats that are much better and easier to parse (UW-IMAP .mbx being one).


 I really don't like mbox files, primarily because they require munging
 From lines in the body of the message.  MMDF would be better, but I
 think ideal from a philosophical point of view would be
 one-message-per-file if it can be done efficiently cross-platform.

Therein lies the problem. Some filesystems make this more feasible than others, at least on larger scale systems.


 Maybe file system experts here can provide pointers or advice on exactly
 which file and operating systems make this approach feasible, even for
 huge message counts.

SGIs XFS on Irix does a pretty good job, with hashed directory structures, and an extent-based journaling filesystem. Regretfully, I don't think that all of these features are fully supported under the Linux version of XFS, and that work has basically ground to a halt with the lay-offs of all the key SGI people who had been working on XFS. Veritas VxFS also does a good job in this area.


Other than SGI XFS for Irix and Veritas VxFS, I don't know of any good solutions to this problem at the filesystem level.


Kirk McKusick and Eric Allman agree with you that this is a proper filesystem problem that should be solved at the filesystem level (at least, that's what they've said to me when I brought this issue up to them), and they feel you should not attempt to solve filesystem problems with "tricks" like INN timecaf/timehash cycbufs.


However, while that's nice in theory, that doesn't necessarily help us here in the real world.

--
Brad Knowles, <[EMAIL PROTECTED]>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
    -Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

_______________________________________________
Mailman-Developers mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/mailman-developers

Reply via email to