Consider the following tables:

CREATE TABLE `dbmail_messageblks` (
 `messageblk_idnr` bigint(21) NOT NULL auto_increment,
 `physmessage_id` bigint(21) NOT NULL default '0',
 `messagepart_order` int NOT NULL,
 `messagepart_idnr` bigint(21) NOT NULL,
 `mime_header` tinyint(1) NOT NULL DEFAULT '0',
 PRIMARY KEY  (`messageblk_idnr`),
UNIQUE KEY `messageblk_idnr_message_part` (`messageblk_idnr`,`messagepart_idnr`),
 KEY `physmessage_id_index` (`physmessage_id`),
 KEY `messagepart_id_index` (`messagepart_idnr`),
CONSTRAINT `dbmail_messageblks_ibfk_1` FOREIGN KEY (`physmessage_id`) REFERENCES `dbmail_physmessage` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `dbmail_messageparts` (
 `messagepart_idnr` bigint(20) NOT NULL auto_increment,
 `message` longblob,
 `size` bigint(20) NOT NULL,
 `hash` varchar(64) NOT NULL,
 PRIMARY KEY  (`messagepart_idnr`),
 UNIQUE KEY `hash_size` (`hash`,`size`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

and a message constructed like follows

->Message
-Non Mime Message or Mime Warning "This is a multi-part message in MIME format."
   ->Plain Text
   ->HTML
   ->JPEG
   ->Attached Message 1
-Non Mime Message or Mime Warning "This is a multi-part message in MIME format."
      ->Plain Text
      ->HTML
      ->MPEG
   ->Attached Message 2
-Non Mime Message or Mime Warning "This is a multi-part message in MIME format."
      ->Plain Text
      ->HTML
      ->GIF

Would be come these message parts

begin
message_part=0   Root Message Headers
message_part=1   Root Non-Mime Message
message_part=2   Boundry and headers ("Plain Text")
message_part=3   Body ("The Plain Text")
message_part=4   Boundry and headers ("HTML")
message_part=5   Body ("The HTML")
message_part=6   Boundry and headers ("JPEG Attachment")
message_part=7   Body ("Base64 of the JPEG")
message_part=8   Boundry and headers ("Attached Message 1")
message_part=9   Attached message 1 Headers
message_part=10  Attached message 1 Non-Mime
message_part=11  Attached message 1 Boundry and headers ("Plain Text")
message_part=12  Attached message 1 Body ("Plain Text")
message_part=13  Attached message 1 Boundry and headers ("HTML")
message_part=14  Attached message 1 Body ("HTML")
message_part=15  Attached message 1 Boundry and headers ("MPEG")
message_part=16  Attached message 1 Body ("MPEG")
message_part=17  Attached message 1 Closing boundry
message_part=18  Boundry and headers ("Attached Message 2")
message_part=19  Attached message 2 Headers
message_part=20  Attached message 2 Non-Mime
message_part=21  Attached message 2 Boundry and headers ("Plain Text")
message_part=22  Attached message 2 Body ("Plain Text")
message_part=23  Attached message 2 Boundry and headers ("HTML")
message_part=24  Attached message 2 Body ("HTML")
message_part=25  Attached message 2 Boundry and headers ("GIF")
message_part=26  Attached message 2 Body ("GIF")
message_part=27  Attached message 2 Closing Boundry
message_part=28  Root Message Closing Boundry
end

While a simple non-mime message would be
begin
message_part=0   Message Headers
message_part=1   Message Body
end


I propose using sha256 as it will give a more unique key at 64 chars than sha1 at 40 chars and only doing the sha256 on the first 8 or 16 MB. I can't think of any real data files that could result in the same size and sha256 of the first 8 MB. This could also be configurable in the dbmail.conf to what ever size the admin choses, and could be changed later, but would require dbmail to be shut down and the dbmail-util run to update every hash in the database to use the new value.

Features:
All parts of the messages can be stored uniquely.

Duplicate attachments/parts - even if they were renamed - can't occur - while the original attached file name remains intact.

I purposly left the boundry's in there so that original message could be re-built in SQL into its original complete data stream for external use or mail export.

The is_header field can be removed as message_part=0 will always contain the headers of the message.

The mime_header field can be used to find an attachment when looking to download only certain parts.

Rebuiling of headers can be done quickly with message_part=0

The messagepart_order allows the inserting of multiple message parts to be done by threads Message above would have a total of 29 threads spawned, though they may not all be spawned at the same due to thread/memory limits. 1. The thread is told the messagepart_order number and passed the data to be inserted.
   2. The size is calculated and the first 8 MB of the data are sha256'ed.
Shorter data would be ok as we would then have a sha256 on the enitre part. 3. Do a quick select on the db to check if the hash and size is already there.
   4. If the data is not there - insert the data
   5. Get the messagepart_idnr for the size and hash.
6. Insert the messageparts row with the matching order and messagepart_idnr

Issues/Slowdowns:
Deletion of messageparts would require a check to see if the part is no longer required. SELECT messagepart_idnr, COUNT(physmessage_id) FROM dbmail_messageblks WHERE messagepart_idnr IN (SELECT messagepart_idnr FROM dbmail_messageblks WHERE physmessage_id='12345' GROUP BY messagepart_idnr) GROUP BY messagepart_idnr; This query would return a row for each unique part in the deleted message. The part could be deleted when the count=1. This is assuming that every message has a phymessage_id and that messages sent to multiple mailboxes duplicate the phymessages and not share one.

-Jon

Paul J Stevens wrote:

Charles Marcus wrote:
DBmail isn't a email system.  It's an imap/pop server.

If you want to control this then configure it in your mail server,
like postfix....
Again - saving drafts bypasses the MTA - and I think it would be a good
thing for DBMail to be able to catch these attempts like this to bypass
the limits set by the sys admin...

The size of any message (and attachment in the near future) is known
whenever it passes through dbmail. Adding a couple of runtime options
delimiting their size in the config file is trivial.

Of course, we'd also have to deal with the fallout like handling max
messagesize exceptions and how to communicate it back to the console,
lmtp, or imap clients. That would require some special attention.



_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Reply via email to