On Jun 1, 2007, at 10:30 AM, Michael Monnerie wrote:
On Freitag, 1. Juni 2007 Jake Anderson wrote:
So you are moving to splitting the messages on logical boundries (ie
message bodies, headers,attachments) rather than the 512k blocks? One
would have to think that would help performance on large messages
greatly? especially if you can (optionally?) move the big files out
of the DB
I highly oppose against moving anything out of the DB. After all,
we use
the DB in order to have the advantages from it: scalability, (high)
availability, replication, etc. If there are external files, who
copies
them to the other servers, etc?
As I understood, the 512k junks will stay, it's just they are
organized
into pieces of attachments, mime-types, etc. There will be a checksum
of each file (md5 or so), so that 2 different e-mails can link to the
same attachment.
Still there can be the problem of checksum collissions, especially on
really large DBs, so I'd suggest using two checksum algorithms.
Or maybe make an md5 sum over each 512k junk, then apply md5 and
sha1 on
those checksums only, for a quicker calculation.
Under no circumstances it should ever happen that "high secret
attachment" of user A has the same checksum of "funny pic" from
user B,
and the DB makes a link for user B to "high secret attachment" of user
A.
How often do you really have a shared file among users?
As an overall percentage of email?
As a percentage of email with attachments?
Typically, when you have email sent to multiple recipients (Joe &
Charlie) then isn't the email content actually stored once in the DB
and shared between the two users?
I'm not talking about a shared mailbox here. Just shared data for
the same email.
As for the email that is the same file sent to two different people
-- I'm not sure how often this is actually going to happen to merit
the effort/risk.
What is the probability of MD5 collision?
_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail