[Mailman-Developers] Re: Threads and robustness against runner crashes

2024-03-04 Thread Mark Sapiro

On 3/4/24 8:00 AM, Stephen J. Turnbull wrote:

Split thread #2.

Justus Winter writes:

  > and when a runner has picked up a mail from a queue, and then
  > crashes, that mail is lost forever (i.e. runner operations are not
  > atomic).

Please report such incidents in as much detail as you can.  The whole
point of "store and forward" is to prevent that.  Runners should not
alter the queuefile until they're done.  If they crash in the middle,
they should leave the queuefile they received and maybe a work file.


The actual process of picking up a queue entry[1] atomically renames the 
queue file from .pck to .bak so until the runner finishes processing the 
file and removes the .bak, there is always a .pck or .bak file in the 
queue. If the runner dies for any reason in processing, whether because 
of a crash or external event, upon restart it will process the .bak 
file(s) so messages are never lost for this reason.


[1] 
https://gitlab.com/mailman/mailman/-/blob/master/src/mailman/core/switchboard.py?ref_type=heads#L151


--
Mark Sapiro The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

___
Mailman-Developers mailing list -- mailman-developers@python.org
To unsubscribe send an email to mailman-developers-le...@python.org
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9


[Mailman-Developers] Re: Introduction, FOSDEM, scaling down, latency, OpenPGP support

2024-03-04 Thread Stephen J. Turnbull
Justus Winter writes:

 > But currently Mailman3 does fork+exec, so it doesn't get to share
 > the parent's pages.  I experimented with fork-and-dont-exec [0],
 > but the results were underwhelming, because reference counting can
 > cause pages to diverge.  Surprisingly, gc.freeze didn't seem to
 > help much, so there may have been issues beyond the reference
 > counts.
 > [...]
 > I think Python just doesn't support sharing code across processes
 > well.

Seems likely.  I know that Emacsen have always advised running just
one process for this reason (also because users usually want all their
recent hacks available in all buffers, but memory hogging is a big
reason).

___
Mailman-Developers mailing list -- mailman-developers@python.org
To unsubscribe send an email to mailman-developers-le...@python.org
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9


[Mailman-Developers] OpenPGP support

2024-03-04 Thread Stephen J. Turnbull
This is split thread #3.

Justus Winter writes:

 > >  >   - Implement OpenPGP support
 > >
 > > What does that mean?
 > 
 > OpenPGP can be used to provide confidentiality and integrity for
 > email.  What exactly that means in the setting of mailing lists
 > varies by threat model and policy.

I was afraid you'd say that.  I mean, it's the right generic answer,
but I've yet to see a viable use case with a plausible threat model
for any of the implementations proposed.

 > My prototype [2] simply records associations between addresses and
 > OpenPGP certificates by consuming Autocrypt headers [3] and when
 > sending an outgoing mail opportunistically encrypting it if a
 > certificate is known.

Except for the Autocrypt part, this has been done.  But there are two
problems: nobody wants it very badly (see this post specifically

and the surrounding thread is also valuable because you'll see all the
reasons why I don't want to do this in Mailman at present, and you're
the first person in decades I think has a good shot at convincing me
otherwise! :-)  The second problem is I don't see a convincing use
case.  Note: I don't consider the opportunistic encryption aspect a
serious flaw.  Obviously this initial proposal is mostly a proof-of-
concept and most (all?) serious applications simply wouldn't send
unencrypted mail.

Steve
___
Mailman-Developers mailing list -- mailman-developers@python.org
To unsubscribe send an email to mailman-developers-le...@python.org
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9


[Mailman-Developers] Threads and robustness against runner crashes

2024-03-04 Thread Stephen J. Turnbull
Split thread #2.

Justus Winter writes:

 > >  > Here are the things I did so far:
 > >  > 
 > >  >   - I have Mailman running with runners in threads instead of
 > >  > processes, but that is in a proof-of-concept stage at this
 > >  > point and needs some cleaning up
 > >
 > > After working with Mailman 3 and Postfix, I've become fond of
 > > the HUPD (HUPD of Uncontrolled Proliferation of Daemons) model
 > > of application design, at least for email.
 > 
 > My prototype let's you chose, for every kind of runner, whether to
 > use the process or thread model

That's not a sales point, as far as I'm concerned.  It adds complexity
for the installer and the site manager, as well as in the code.

 > I don't quite buy (or maybe I'm not understanding the whole picture)
 > into the argument that having individual processes improves the
 > robustness of the whole system.

I'm talking about the developer/maintainer experience, not about run
time.

 > From my experience, having individual runners killed can render
 > Mailman unusable [0] (and to my then untrained eye it was
 > impossible to see that a runner was missing,

That's some combination of documentation, logging, and tooling bugs.
At the very least "mailman status" should report whether all the
runners that were started are still present (it doesn't at present).

It's really not hard to detect a crashed or stalled runner, even in a
sliced (multirunner) queue -- queuefiles start to pile up.  (By "not
hard" I mean you can use "ls" or "du", not that it should be obvious
what to do.)

 > if on the other hand Mailman would have been a single process, or a
 > significantly smaller number of processes, a single missing process
 > would have been more apparent),

True, but to me crashes in a monolithic program are less acceptable,
expecially threaded, because other concurrent operations may depend on
that program staying alive.  The way exception handling is done in
Mailman 2 with a big "except Exception" around the whole program, you
mostly would not get a crash at all, just a log message with an
traceback, probably unintelligible to a non-developer of Mailman.  Not
clear that's a win over the current situation for you.  Sure, you can
probably arrange for exception handling to be per-thread in some
sense, but that's going to be conceptually harder than the the "log
the exception, let it crash, have the master restart it and pray"
approach we use in the multiprocess model.

 > and when a runner has picked up a mail from a queue, and then
 > crashes, that mail is lost forever (i.e. runner operations are not
 > atomic).

Please report such incidents in as much detail as you can.  The whole
point of "store and forward" is to prevent that.  Runners should not
alter the queuefile until they're done.  If they crash in the middle,
they should leave the queuefile they received and maybe a work file.

___
Mailman-Developers mailing list -- mailman-developers@python.org
To unsubscribe send an email to mailman-developers-le...@python.org
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9


[Mailman-Developers] Optimization for constrained environments and latency

2024-03-04 Thread Stephen J. Turnbull
I'm going to split this into three separate threads with appropriate
titles.  This is #1.

Regarding optimization for constrained environments, I wrote:

 > > But I'm pretty sure people have run Mailman 3 on a Raspberry Pi.
 > > How constrained an environment are you aiming for?

Justus Winter writes:

 > I had problems on my shared hoster that provided 1 gigabyte of RAM
 > per user (I'm not a 100% on how they measure that).

OK, yes, my estimate says that's going to be too tight.  I'm seeing
about 80MB per runner with a full complement of processes without any
slicing being 18.  Some of that is shared (IIRC about 5MB/runner
according to top), but that only buys back 1 runner's worth.  There
are several somewhat optional processes (the nntp, archive, command,
and 2 WSGI processes for Mailman Web) but that's probably still not
quite going to fit into 1GB.

Re polling queues:

 > the runners are polling their queues in loops.  My installations
 > that hardly see any traffic at all are all doing: do I have work,
 > no, sleep 1, do I have work, no, sleep 1... I can see that this
 > will amortize in big installations, but for small ones this is
 > quite sad.

I guess, but if it doesn't show in the load average, I'm not sure why
one should care.  I don't know about your installation, but Mailman
consumes less than 1% of CPU when idle as far as I can tell.  For me
to support a change here, either you'd need to show a non-negligible
improvement or it would have to be "free" (see below).

 > >  >   - Improve latency of messages
 > >
 > > What latency are you observing?

 > And even for big installations, or if we say that efficiency is not
 > important, if a mail goes through the hands of three queue runners,
 > the worst-case latency is three seconds in an otherwise idle
 > installation!  We can definitively improve upon that.

Who will notice?  Is there anybody who cares about a 3s latency in
list email?  If there is, that would be a user-visible improvement to
set against any increase in code complexity.

 > The key insight here is that emails in queues don't appear out of
 > thin air, another runner is putting them there.  If each runner
 > that goes to sleep does so by waiting on a condition variable
 > associated with its queue, and every runner that deposits a mail
 > into the queue signals the sleeping runners, that latency goes away
 > while at the same time improving efficiency by no longer having to
 > poll the queue every second.

Thing is, email (and Mailman specifically) operates on a store-and-
forward model.  The queue file *must* be present for a runner to do
its work, and conversely, if a file is present the runner has work to
do.  Polling is a little ugly, but it's a perfect fit for the problem
semantically, and very simple to explain and implement.

If the condition-variable-based code is equally simple, equally
reliable, and identical across our supported platforms, sure, that's
worth looking at because we get your developer-visible efficiency
enhancements for "free".  But if any of those requirements fails, I
would want to see an improvement in user-visible performance.

 > > I will take a look at the work you mention, but it will be a
 > > couple of weeks at least before I have useful comments.

Still need some time for this, but I wanted to get some stuff out of
my inbox. :-)
___
Mailman-Developers mailing list -- mailman-developers@python.org
To unsubscribe send an email to mailman-developers-le...@python.org
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9