Re: [Dbmail] single instance mime storage

Jonathan Feally Mon, 04 Jun 2007 22:09:28 -0700

Just to be clear - are you taking the headers of the mime part andkeeping them seperate of the body of said part? My original post to keepthem seperate was to allow for the attached file be renamed but notcreate duplicates, and also some mail clients may create the headers ofthe mime part slightly different when taking the file and re-attachingit into a new message.


One mail client may attach a file with headers like:


--------------090708070905030201070605
Content-Type: image/jpeg;
name="picture.jpg"
Content-Transfer-Encoding: base64
Content-Disposition: inline;
filename="picture.jpg"

While a different client with the same file like:

--------------090708070905030201070606
Content-Type: image/jpeg; name="picture.jpg"
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename="picture.jpg"

If the headers weren't seperate from the body - we would end up with 2copies of this file=2 parts - which would take up more space than 2parts of the unique headers and only 1 part for the file.

Dropping of the boundaries - Multiple boundaries in a single piece ofmail could present a problem - such as an attached message that wasalready mime. Complete retreival of the message would require decodingof the attached message headers on-the-fly. I propose that any time asingle mime part is to be retrieved that the boundry is stripped offon-the-fly when the client is not expecting it to be in the returneddata. This allows the entire message to still be retrieved with outmagic and less computing overhead. I personally have a script that readsmessages put into public spam/not spam folders out of the database forauto spam learning. Taking the boundaries out would require me to changethe script from pure sh and sql to something like php and imap functionsto get the same data.

Hashing - hashing of the entire data chunk is probably the safest way togo, but allowing multiple hash types to be used - such as taging thehash as {sha1}thehashdatahere would not help to keep the unique partsstored uniquly as one hash would not match a different method.


-Jon

Paul J Stevens wrote:

I think this discussion should move to dbmail-dev.

I've already finished most of the singleton mimechunk storage :-)

A bit raw around the edges, and only for sqlite atm. Stay tuned.

Fully backward compatible: just add two tables like I stated. Using sha1 over
the whole of the file, fall back to old-style storage if new-style not
available. I guess the one doing the work gets to make the decisions...

In this new setup; simple rfc2822 messages are stored in a single block. Yes: no
more chopping off of the headers. For multi-part messages, the first block
contains the rfc headers plus the mime-preamble. Following blocks contain the
mime-parts as-is.

Boundaries between parts are reconstructed at retrieval based on the boundary
used in the original message and stored in the first block.

I'm using the sha1 code from the mozilla project (dual licence). It's what Linus
uses in the GIT code, so it *must* be fast.

Are we having fun yet?


_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Re: [Dbmail] single instance mime storage

Reply via email to