On Mon, 2009-01-05 at 08:25 -0500, Jeffrey Stedfast wrote: > migrating away from the IMAP specific data cache would be good.
Yes. I think IMAP and the local providers are the only ones that are still using a specialized datacache. The IMAP4 one, for example, ain't using a specialized one. > >> b) migrate away the mbox data cache (the all-in-one file crap) > >> > > I'm all for it. Once I thought of doing this, but the options were like > > Maildir or a format of one mbox file per mail in a distributed folder > > [CamelDataCache sort of format, like imap4/GW/Exchange]. But IIRC Fejj, > > had some concern like, Local still might be good to be held in a > > 'standards' way. I know it hurts us on expunge/mailbox rewrite etc. > > > > what mbox data cache? CamelDataCache would probably be the best cache to > use for IMAP. Although I would change CamelDataCache to store individual MIME parts as separate files instead of files that look like a single-mail MBox file. I would also decode the separate MIME parts before storing if the original E-mail had them encoded (which is usually the case, and always for binary attachments). This to make it more easy for metadata engines to index the MIME parts, and to allow such to do this efficiently. Perhaps also to reduce disk-space, as encoded consumes more disk-space, but that is for me just a nice side-effect. So my format would create a directory foreach E-mail, or prefix each MIME part with the uid. Perhaps INBOX/subfolders/temp/1. // headers+multipart container INBOX/subfolders/temp/1.1 // multipart container INBOX/subfolders/temp/1.1.1 // text/plain INBOX/subfolders/temp/1.1.2 // text/html INBOX/subfolders/temp/1.2.1 // inline JPeg attachment INBOX/subfolders/temp/1.BODYSTRUCTURE // Bodystructure of the E-mail INBOX/subfolders/temp/1.ENVELOPE // Top envelope of the E-mail ps. Perhaps I would store 1.BODYSTRUCTURE in the database instead. I would probably store 1.ENVELOPE in the database (like how it is now). I would probably on top of storing BODYSTRUCTURE and ENVELOPE in the database also store them in separate files. Even if most filesystems will consume 4k or more (sector or block size) for those mini files. To get the JPeg attachment: $ cp INBOX/subfolders/temp/1.2.1 ~/mommy.jpeg $ exif INBOX/subfolders/temp/1.2.1 EXIF tags in 'INBOX/subfolders/temp/1.2.1' ('Intel' byte order): --------------------+---------------------------------- Tag |Value --------------------+---------------------------------- Image Description |Mommy with cake at birthday Manufacturer |SONY Model |DSC-T33 ... $ tracker-search -s EMails birthday Results: email://u...@server/INBOX/temp/1 email://u...@server/INBOX/temp/1#2.1 ~/mommy.jpeg [CUT] > this can cause problems if you need to verify signed parts because > re-encoding them might not result in the same output. Ok, for signatures I guess we can make an exception and keep then encoded in their original format then. > >> For Maildir I recommend wasting diskspace by storing both the original > >> Maildir format and in parallel store the attachments separately. > >> > >> Maildir ain't accessible by current Evolution's UI, by the way. > >> > >> For MBox I recommend TO STOP USING THIS BROKEN FORMAT. It's insane with > >> today's mailboxes that easily grow to 3 gigabytes in size per user. > >> > > I second your thoughts for MBox stuff. > > > > Eh, I think mbox works fine but I can understand wanting to move to > Maildir which is also fine :-) Maildir doesn't store individual MIME parts separately. So Mailbox is equally hard to handle for metadata engines as MBox is. Only difference with MBox is that we need to seek() to some location. So Maildir doesn't make it possible for us to let app developers implement indexing plugins easily, like a typical exif extractor. We would have to Base64 decode image attachments before extracting exif, for example. Instead of just saying: here's a stream, or here's a FILE*, go ahead and extract the info you want. (with a stream we could make it relatively easy to auto-base64 decode, but often are these extractors still FILE* based, not stream based). There's IMO not really a good reason to keep the attachments stored in their encoded version. Except the signatures, perhaps, but we don't really need those in decoded form anyway. So it would be fine to have an exception on signatures (to keep them encoded-stored). Hmmaybe someday having the fingerprint information about a person might be useful to verify the identify of an individual before linking the person with a contact in our RDF triple store. -- Philip Van Hoof, freelance software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://pvanhoof.be/blog http://codeminded.be _______________________________________________ Evolution-hackers mailing list Evolutionfirstname.lastname@example.org http://mail.gnome.org/mailman/listinfo/evolution-hackers