Michael Monnerie wrote:
> On Montag, 17. Dezember 2007 Daniel Urstöger wrote:
>> but all with the same file size? I wonder if the guys are
>> implementing some kind
>> of checksum too, then it won´t matter at all, I assume...
> 
> There was some discussion about that in summer, and yes, there are of 
> course checksums. By the time it was found that SHA256 would be 
> sufficient IIRC, but I don't know what Paul has used now.
> 
> So there shouldn't be any realistic collision possible.

I'm using sha1, not sha256.

And I'm hashing MIME-payload and MIME-headers, not full mime-parts. That means
that even if the same file is sent using different names (myfile.pdf and
yourfile.pdf) that file is stored only once since the actual payload will render
the same hash.


A very simply message:

--------------------------
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: asdf

Ipsum lorem ad nauseum.
--------------------------

will be split like:

----<cut>----
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: asdf
----<cut>----
Ipsum lorem ad nauseum.
----<cut>----


A multipart message:

--------------------------
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: sadf
MIME-Version: 1.0
Content-type: multipart/mixed; boundary=boundary

This message will not be seen by the user

--boundary
Content-type: text/html
Content-disposition: inline

<html><head></head><body>Message</body></html>

--boundary
Content-type: text/plain; charset=us-ascii; name=testfile
Content-transfer-encoding: base64

IyEvYmluL2Jhc2gNCg0KY2xlYXINCmVjaG8gIi4tLS0tLS0tLS0tLS0tLS0t
IyEvYmluL2Jhc2gNCg0KY2xlYXINCmVjaG8gIi4tLS0tLS0tLS0tLS0tLS0t
IyEvYmluL2Jhc2gNCg0KY2xlYXINCmVjaG8gIi4tLS0tLS0tLS0tLS0tLS0t
--boundary--
--------------------------

will be cut along the following lines:

----<cut>----
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: sadf
MIME-Version: 1.0
Content-type: multipart/mixed; boundary=boundary

----<cut>----
This message will not be seen by the user
----<cut>----
Content-type: text/html
Content-disposition: inline
----<cut>----
<html><head></head><body>Message</body></html>
----<cut>----
Content-type: text/plain; charset=us-ascii; name=testfile
Content-transfer-encoding: base64

----<cut>----
IyEvYmluL2Jhc2gNCg0KY2xlYXINCmVjaG8gIi4tLS0tLS0tLS0tLS0tLS0t
IyEvYmluL2Jhc2gNCg0KY2xlYXINCmVjaG8gIi4tLS0tLS0tLS0tLS0tLS0t
IyEvYmluL2Jhc2gNCg0KY2xlYXINCmVjaG8gIi4tLS0tLS0tLS0tLS0tLS0t
----<cut>----

Notice how the '--boundary' strings are missing after splitting. Each part
between the ---<cut>--- lines will be stored as a separate mimepart in the
dbmail_mimeparts table using it's sha1 hash as it's primary key. Message parts
are linked together using the partlists table. Using this partlists table,
message are reconstructed during retrieval adding back the boundary strings as
indicated by the boundary field in the Content-type header.

The same pattern applies to nested mime constructs as well, such as
multipart/mixed messages that contain message/rfc822 mimeparts that contain
multipart/mixed messages. We simply recurse into the nested structure. Well not
so simple, but it works great.

-- 
  ________________________________________________________________
  Paul Stevens                                      paul at nfg.nl
  NET FACILITIES GROUP                     GPG/PGP: 1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Reply via email to