Re: WORA Considered Evil ;-)

Pier Fumagalli Thu, 26 Jun 2003 15:31:02 -0700

NOTE: This is a technical digression on the ethics of MTAs, as Brian B.
inculcated in my brain over the past 4 years, or better, as I developed his
teachings. (Brian is master of Qmail, and Qmail is master of MTAs, being
master a transitive property). I am just a slow learner...

So, if you don't want to get into technical details about email, well, hit
DELETE now...

On 26/6/03 17:28, "Stefano Mazzocchi" <[EMAIL PROTECTED]> wrote:

> [...]
> So, one night, when I was visiting them in London, Pier and I sit down
> and talked about how feasible/useful/dangerous was to update our email
> infrastructure to JAMES. [despite the little coding I've done on it, I'm
> still emotionally attached to the idea of having an entire email
> infrastructure based on the beauty of java modularity and pluggability]
> 
> It turned out that Pier had pretty rock solid arguments *not* to use
> JAMES as a MTA and all came from the sysadm paranoia that he grew
> accustomed to (and which I totally lack, given my very basic sysadm
> skills and experience).
> 
> Unfortunately, I don't recall exactly what his arguments were, Pier, do
> you have a minute to chime in? I think the JAMES people would love to
> hear your criticism.

There are a quite consistent number of advantages in running a native MTA
compared to a Java-only solution on UNIX systems, all derived from one
single winning point: multi-processing.

Let's try to identify the main components of an MTA:

The most important piece is the mail queue: a queue is a transient storage
where messages are held temporarily, during the message processing stage.
There might be different queues per MTA (incoming, outgoing, in-process),
but one point is fundamental: the queue needs to be fast, reliable and less
messages are in any queue, better it is, at all time.

Other vital part is the "injector", aka something that reads a message from
somewhere (file, network, another queue), and stores it in a queue.

Third part is the despooler, taking a message from the queue and delivering
it somehow (to a file, a pipe, through the network, or to another queue).

Fourth and final component is the "processor", which is no more, no less,
than the union of a injector and a despooler, but only operating on queues
(therefore, a processor reads a message from the queue, does something with
it, and puts it back into the queue).

Diagram:
           +----------+      +-------+      +-----------+
INPUT----->| injector |----->|       |----->| despooler |----->OUTPUT
           +----------+      | queue |      +-----------+
                         +-->|       |--+
                         |   +-------+  |
                         |              V
                        +----------------+
                        |    processor   |
                        +----------------+

Complicate it as much as you want, but this is the basic...

Add to this diagram another component, pervasive throughout the entire
drawing, a "controller", or something that makes all those separated
components talk together.

All those components must run asynchronously, independently, completely
separated from one another and (for security) under different user
privileges. NOTHING (apart from the master controlling daemon, doing
nothing) runs as root.

The lifecycle of the MTA, then, is the following:

1) the controller starts up (root) and binds to all required listening ports

2a) once a connection from the input is established (to the controller), the
    controller forks, downgrades to the "queue" user and executes the
    "queue" process

2a) the controller forks again, downgrades to "injector", executes the
    "injector" process.

2b) the controller connects (usually via pipes, but could also work on local
    network) the newly created "injector" output with the input of the
    "queue" created in step 2a.

2c) the message is read from the original INPUT as the "injector" user, and
    "piped" to the queue by the other process as the "queue" user. No I/O
    happens as ROOT (call it defensive programming, Brian B., late 1999).

3) once the message is in the queue, if required the controller connects
   "queue" with "processor" and again with "queue" in a similar way as
   described in step 2. This happens as many times as it is required (a
   message can be re-injected, altered, god knows, but again, nothing
   works as "root" and everything is isolated from anything else).

4) once the message doesn't require further processing, again as in step
   2, the controller connects "queue" with "despooler" and sends the
   message.

So, overall, every single part is completely isolated from any other,
nothing runs as a privileged user, no process has power to interact or
disrupt the operation of another, apart from the controller that all it does
is "create pipes, fork, downgrade, and execute".

Notably, each interaction is transactional (so, for example, unless the
"queue" process is terminated successfully, the SMTP injector won't report
to the other end that the message has been accepted, and so on)... No
messages are lost (in theory).

You see how multi processing can hugely help in terms of reliability and
security, but there are several other advantages: every process is TINY (on
qmail in the order of 1 megabyte... It's fast to create, it's fast to
destroy, and it runs in its little sandbox, if it dies (out of memory?) all
other running processes are untouched...

That is _BY_FAR_ the best architecture ever, it might be not the fastest
one, but for sure it is the most secure and reliable.

Plus you get the advantage of running other processes most of the times. For
example, anti-virus engines, anti-spam engines, or even MUAs (mail user
agents, like IMAP/POP3 server) are all little tiny things, they come
packaged as simple binaries, and can be executed completely independently
and separated completely from the whole mail injection-process-despooling
thing.

Now, take our (betaversion) example:

> Now, Pier, Fede and I share our email infrastructure on betaversion.org.
> It's a pretty complex (and very powerful) setup made with
> qmail+cyrus+bogofilter+sieve+Horde/IMP
> [...]

Qmail works as described above (N processes running as N users doing the
different bits and bobs), bogofilter is running as a "processor" completely
separated from the Qmail processors (the ones doing alias rewriting and
stuff), cyrus runs as an injector (and in its own is separated into several
different processes as well), sieve/horde and similia run under Apache...

What I get from all this "separation" and independence? Example:

Qmail fails (or I have to take it down for some odd reason?), Stefano can
read his email from Horde via Apache on his IMAP store running on Cyrus.

Cyrus crashes? Well the message my mom sent me at the same time is queued by
Qmail and will be delivered when Cyrus comes back up.

I started disliking Qmail? Simple, I install postfix and don't touch
anything else in my entire configuration...

It's a "concerto" as Stefano pointed out correctly, of interconnected but
completely independent and self-reliant pieces of software, and it works...

Now, when I see James, I see a nice mail server, yeah, cool, but it has
everything inside it... SMTP server, QUEUE, mailing list processor, MUA,
SMTP client, web server EVERYTHING running in one big huge process, all with
the same privileges from the OS point of view, and know what, if my SMTP
engine causes a JVM internal error, my IMAP, my webmail, my mailing lists,
and my outgoing SMTP queue are stalled as well...

NOT nice, actually, it looks so much alike to Lotus Notes running on a
Windoze Server... Bulky, monolithic, hardly scalable, or interoperable with
other software...

    Pier

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: WORA Considered Evil ;-)

Reply via email to