Just to be clear - are you taking the headers of the mime part and keeping them seperate of the body of said part? My original post to keep them seperate was to allow for the attached file be renamed but not create duplicates, and also some mail clients may create the headers of the mime part slightly different when taking the file and re-attaching it into a new message.

One mail client may attach a file with headers like:

--------------090708070905030201070605
Content-Type: image/jpeg;
name="picture.jpg"
Content-Transfer-Encoding: base64
Content-Disposition: inline;
filename="picture.jpg"

While a different client with the same file like:

--------------090708070905030201070606
Content-Type: image/jpeg; name="picture.jpg"
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename="picture.jpg"

If the headers weren't seperate from the body - we would end up with 2 copies of this file=2 parts - which would take up more space than 2 parts of the unique headers and only 1 part for the file.

Dropping of the boundaries - Multiple boundaries in a single piece of mail could present a problem - such as an attached message that was already mime. Complete retreival of the message would require decoding of the attached message headers on-the-fly. I propose that any time a single mime part is to be retrieved that the boundry is stripped off on-the-fly when the client is not expecting it to be in the returned data. This allows the entire message to still be retrieved with out magic and less computing overhead. I personally have a script that reads messages put into public spam/not spam folders out of the database for auto spam learning. Taking the boundaries out would require me to change the script from pure sh and sql to something like php and imap functions to get the same data.

Hashing - hashing of the entire data chunk is probably the safest way to go, but allowing multiple hash types to be used - such as taging the hash as {sha1}thehashdatahere would not help to keep the unique parts stored uniquly as one hash would not match a different method.

-Jon

Paul J Stevens wrote:

I think this discussion should move to dbmail-dev.

I've already finished most of the singleton mimechunk storage :-)

A bit raw around the edges, and only for sqlite atm. Stay tuned.

Fully backward compatible: just add two tables like I stated. Using sha1 over
the whole of the file, fall back to old-style storage if new-style not
available. I guess the one doing the work gets to make the decisions...

In this new setup; simple rfc2822 messages are stored in a single block. Yes: no
more chopping off of the headers. For multi-part messages, the first block
contains the rfc headers plus the mime-preamble. Following blocks contain the
mime-parts as-is.

Boundaries between parts are reconstructed at retrieval based on the boundary
used in the original message and stored in the first block.

I'm using the sha1 code from the mozilla project (dual licence). It's what Linus
uses in the GIT code, so it *must* be fast.

Are we having fun yet?


_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Reply via email to