I think it's called "mime chunking" or "single instance storage." I apparently was wrong about searchability for duplications as there is some sort of hashing scheme for this. I could be wrong.

http://dbmail.10918.n7.nabble.com/Newbie-Question-single-instance-store-for-attachmens-td13048.html



On 4/13/2014 1:19 PM, KT Walrus wrote:
DBMail already does a lot of data deduplication (headers, attachments, etc.).  
I’m just not clear how far this goes and whether my turning a message to a list 
of recipients into multiple copies of the message with different To: and 
possibly different Message-Id: affects the data de-duplication.

If I should keep the headers the same for all copies of the message to get 
maximum data deduplication, I will.  I just prefer each recipient see the To: 
as to only their address and not know about everyone else.

As for my “app”, it is a PHP app that uses the RoundCube Framework to provide 
an IMAP interface to the user for accessing their mailbox and some public 
mailboxes.  The user sends messages using SMTP and I have a milter to send the 
message to a special outbox mailbox (in DBMail).  Then, I have a PHP cron job 
that checks the outbox, retrieves the queued messages, preprocesses the message 
headers, and uses dbmail-deliver to send the message to the appropriate 
recipients.

I have all this working quite nicely.  But, I’m trying to figure out the best 
way to send a To: customized copy of each message to each recipient.

I need to understand how DBMail does data deduplication.

Kevin

On Apr 13, 2014, at 3:35 PM, Mark Winslow <furf...@omnicode.com> wrote:

I'm confused about what you're trying to accomplish.  I haven't used dbmail 
yet, but I've read up on it and and about to implement a test version.

When you talk about your "app," where in the mail delivery process are you 
forwarding the messages? My understanding is that dbmail is an IMAP server that 
implements the LMTP protocol to receive mail from a mail transport agent like procmail or 
sendmail.  Does your app work by forwarding messages through an MTA or are you going to 
dup things in the database backend?  Or something else?

As for the alias/caching scheme you mention, it sounds very complex.  The 
simplest way of dealing with it would be if dbmail were to check globally for 
exact copies of message bodies.  It seems like it would be very expensive in 
terms of processing time because presumably unindexed message bodies would have 
to be checked against potentially millions of other message bodies.

If you knew you were duping the message bodies, you could give each large body 
a unique tag and reference that.  However, I doubt that dbmail does that, and 
I'm not sure if a plugin or something could be easily made to do it.  Duping 
the messages on the database would be easy.  The hard part would be hooking 
into dbmail's IMAP serving mechanism.

Just my take.


On 4/13/2014 10:49 AM, KT Walrus wrote:
I’m working on implementing a mailing list feature in my app.  Each user has 
their own mailing list with the mailing list recipient addresses stored in my 
database.  The user can send mail to the mailing list address and the message 
would be delivered to each recipient address for the list.

I would like to change the To: header from the mailing list address to the 
individual recipient address for each copy of the message delivered (and add a 
Reply-To: header to use for replying to the message to the group).  Basically, 
I don’t want recipients to see the original mailing list address or other 
recipient addresses in their email.

My question is:

If I change the To: header and use dbmail-deliver to deliver each changed 
message, will all copies of the messages be efficiently stored (given that each 
copy has a different To: header)?

Also, should I change the Message-Id: header in each copy of the message before 
using dbmail-deliver to send a copy of the message to an individual recipient?  
Does changing To: or Message-Id: affect storing of attachments?  I only want 
the attachment stored once regardless of the number of messages it is attached 
to.  I would like the message bodies and unchanged headers be stored only once 
regardless of the number of copies for the message.

Or, would it be better to just change the To: header to “Undisclosed 
Recipients:” and the message headers and body the same in all the 
dbmail-deliver copies?

Kevin

_______________________________________________
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
_______________________________________________
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
_______________________________________________
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail

_______________________________________________
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail

Reply via email to