1) what does MD fill in if you leave the $helo argument blank? Does it fill in the hosts own hostname? try to send a blank? what? I have 1 mimedefang-filter that I deploy on 5 machines... it'd be nice to not have to customize this in any way. If MD doesn't fill in a blank with "the right thing", can I make this into a feature request?



2) Has anyone set up a means of caching results? I don't want to hit my back-line servers constantly with these requests. I would prefer to have results cached for, say, 2 hours. I'm trying to think of a good way to do this.

One thought I had was to have each machine have an external database where the email address is the key, and it has 2 values: time last checked, and account state (ok, unknown, over-quota). Then I'd process it like this:

If (address is cached) && ((now - last_checked) <= cache_life)
   use the cached result

If (address is not cached) || ((now - last_checked) > cache_life)
   if the address is valid (via md_check_against_smtp_server() )
      if the address is an account
         if the account is over quota
            state = over-quota
         else
            state = ok
      else
         state = ok
   else
      state = unknown

   update the cache with the new result and last_checked time.


Anyone have thoughts about good and bad ways to do that?

I could just store it in a hash, but that means each child process will check on its own. That's potentially 30 children * 5 machines * 30,000 addresses = 4.5 million md_check_against_smtp_server() calls ... which doesn't even include the actual SMTP deliveries.

If I cache it in a local database, that's easy and cheap. I then cut that down to 5 machines * 30,000 addresses, or .15 million calls per 2 hours. Plus, I can potentially cut it further by having an external process that goes through and cleans things up every hour or so (seed the database with known good addresses from our account management system; do the quota checks so they don't have to be done in real time, etc.). That might significantly cut down the number of calls. And, if I'm really confident about the seeding process, I might even be able to omit the md_check_against_smtp_server() calls entirely, because the seeding process already told me everything I needed to know.

I could also use an external database server, but then I'm introducing points of failure into the process, and shifting "lots of calls to the backend server" to "lots of calls to the database server".


I'm sort of leaning toward the "local database" approach, but I've never really played with ties and such before.




_______________________________________________
NOTE: If there is a disclaimer or other legal boilerplate in the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpenguin.com
MIMEDefang mailing list [email protected]
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang

Reply via email to