Re: [Dbmail] single instance mime storage

Paul J Stevens Tue, 05 Jun 2007 00:10:41 -0700

Jonathan Feally wrote:
> Just to be clear - are you taking the headers of the mime part and
> keeping them seperate of the body of said part? My original post to keep
> them seperate was to allow for the attached file be renamed but not
> create duplicates, and also some mail clients may create the headers of
> the mime part slightly different when taking the file and re-attaching
> it into a new message.


My current setup keeps the mime headers together with the content. But
I've been thinking about it a little, and storing the actual content
separate from the headers is much better. Doing that done-right would
also mean recursing into the message/rfc822 attachments. That would
require some serious rethinking of the schema.


> If the headers weren't seperate from the body - we would end up with 2
> copies of this file=2 parts - which would take up more space than 2
> parts of the unique headers and only 1 part for the file.

agreed. we don't want that.

> 
> Dropping of the boundaries - Multiple boundaries in a single piece of
> mail could present a problem - 

not really.

> such as an attached message that was
> already mime. Complete retreival of the message would require decoding
> of the attached message headers on-the-fly.

yes. I'm currently doing my own retrieval of the content-type header,
using gmime to parse the actual content-type parameters. Doing this
while recursing into message/rfc822 attachments is not that much harder.

> I propose that any time a
> single mime part is to be retrieved that the boundry is stripped off
> on-the-fly when the client is not expecting it to be in the returned
> data.

Clients *always* expect the /headers/ to come as part of the attachment,
and *never* the boundaries. The current IMAP code already does this correct.

> This allows the entire message to still be retrieved with out
> magic and less computing overhead. I personally have a script that reads
> messages put into public spam/not spam folders out of the database for
> auto spam learning. Taking the boundaries out would require me to change
> the script from pure sh and sql to something like php and imap functions
> to get the same data.

Yes, that is a likely outcome. But I'll try to get to a setup that will
allow message retrieval without complex parsing.


> 
> Hashing - hashing of the entire data chunk is probably the safest way to
> go, but allowing multiple hash types to be used - such as taging the
> hash as {sha1}thehashdatahere would not help to keep the unique parts
> stored uniquly as one hash would not match a different method.

Agreed. I'll use SHA1 to generate unique ids of the attachments (and
maybe even other data as well: instant tamper-safe storage)



-- 
  ________________________________________________________________
  Paul Stevens                                      paul at nfg.nl
  NET FACILITIES GROUP                     GPG/PGP: 1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Re: [Dbmail] single instance mime storage

Reply via email to