Jonathan Feally wrote:
Consider the following tables:

CREATE TABLE `dbmail_messageblks` (
  `messageblk_idnr` bigint(21) NOT NULL auto_increment,
  `physmessage_id` bigint(21) NOT NULL default '0',
  `messagepart_order` int NOT NULL,
  `messagepart_idnr` bigint(21) NOT NULL,
  `mime_header` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY  (`messageblk_idnr`),
UNIQUE KEY `messageblk_idnr_message_part` (`messageblk_idnr`,`messagepart_idnr`),
  KEY `physmessage_id_index` (`physmessage_id`),
  KEY `messagepart_id_index` (`messagepart_idnr`),
CONSTRAINT `dbmail_messageblks_ibfk_1` FOREIGN KEY (`physmessage_id`) REFERENCES `dbmail_physmessage` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `dbmail_messageparts` (
  `messagepart_idnr` bigint(20) NOT NULL auto_increment,
  `message` longblob,
  `size` bigint(20) NOT NULL,
  `hash` varchar(64) NOT NULL,
  PRIMARY KEY  (`messagepart_idnr`),
  UNIQUE KEY `hash_size` (`hash`,`size`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

and a message constructed like follows

->Message
-Non Mime Message or Mime Warning "This is a multi-part message in MIME format."
    ->Plain Text
    ->HTML
    ->JPEG
    ->Attached Message 1
-Non Mime Message or Mime Warning "This is a multi-part message in MIME format."
       ->Plain Text
       ->HTML
       ->MPEG
    ->Attached Message 2
-Non Mime Message or Mime Warning "This is a multi-part message in MIME format."
       ->Plain Text
       ->HTML
       ->GIF

Would be come these message parts

begin
message_part=0   Root Message Headers
message_part=1   Root Non-Mime Message
message_part=2   Boundry and headers ("Plain Text")
message_part=3   Body ("The Plain Text")
message_part=4   Boundry and headers ("HTML")
message_part=5   Body ("The HTML")
message_part=6   Boundry and headers ("JPEG Attachment")
message_part=7   Body ("Base64 of the JPEG")
message_part=8   Boundry and headers ("Attached Message 1")
message_part=9   Attached message 1 Headers
message_part=10  Attached message 1 Non-Mime
message_part=11  Attached message 1 Boundry and headers ("Plain Text")
message_part=12  Attached message 1 Body ("Plain Text")
message_part=13  Attached message 1 Boundry and headers ("HTML")
message_part=14  Attached message 1 Body ("HTML")
message_part=15  Attached message 1 Boundry and headers ("MPEG")
message_part=16  Attached message 1 Body ("MPEG")
message_part=17  Attached message 1 Closing boundry
message_part=18  Boundry and headers ("Attached Message 2")
message_part=19  Attached message 2 Headers
message_part=20  Attached message 2 Non-Mime
message_part=21  Attached message 2 Boundry and headers ("Plain Text")
message_part=22  Attached message 2 Body ("Plain Text")
message_part=23  Attached message 2 Boundry and headers ("HTML")
message_part=24  Attached message 2 Body ("HTML")
message_part=25  Attached message 2 Boundry and headers ("GIF")
message_part=26  Attached message 2 Body ("GIF")
message_part=27  Attached message 2 Closing Boundry
message_part=28  Root Message Closing Boundry
end

While a simple non-mime message would be
begin
message_part=0   Message Headers
message_part=1   Message Body
end


I propose using sha256 as it will give a more unique key at 64 chars than sha1 at 40 chars and only doing the sha256 on the first 8 or 16 MB. I can't think of any real data files that could result in the same size and sha256 of the first 8 MB. This could also be configurable in the dbmail.conf to what ever size the admin choses, and could be changed later, but would require dbmail to be shut down and the dbmail-util run to update every hash in the database to use the new value.


So if I add one line at the end of a word doc, this will be considered to be the same file unless the file format includes something unique within the first 8MB of the file?
_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Reply via email to