On Tue, Oct 25, 2011 at 09:57:50PM +0200, Stefan Unterweger wrote:
> * Pascal Stumpf on Wed, Oct 12, 2011 at 05:39:48PM +0200:
> > > Check your /var/log/lpd.errs.
>
> > Doesn't contain anything but "restarted" messages.
>
> > > Also, ktracing lpd with the the -i flag might give a clue to what the
> > > lpd child is doing.
>
> > Apparently, it segfaults:
>
> > I remembered I had the "S" malloc flag set, so I removed
> > /etc/malloc.conf, and ta-daaa, works. So this is a bug in the lpd code.
> > I suspect it is somewhere in the "common" code for all lp programs, as
> > I've also experienced SIGSEGVs in lpc. I'll see if I can hunt it down
> > further if I have time ...
>
> I've had a very similar problem after last upgrading to -current.
> lpr'ing new jobs would spool them, but complaining about 'unable to
> start daemon'. Restarting lpd, purging the queue and some other
> hocuspocus eventually got the printing going again, but this was pretty
> much at random -- sometimes, it'd just work. (All that without the 'S'
> flag to malloc.conf, though.)
S is just better at hitting wrong usage of malloc. Without S, the
problem (use after free basically) still existed. Due to the
non-determinsitic behavour of our malloc the probem will only hit now
and then (without S).
>
> The patches from Otto and Todd (i.e., today's snapshot) made the problem
> disappear -- many thanks! The rest of the message is just for the
> archives (Googling for this kind of problem is an exercise in
> frustration...).
>
> The log was basically useless (the lpd master process _did_ see and log
> the new jobs, but then apparently did nothing about them). After digging
> through the code, it seems to be the same problem as Pascal's, that the
> lpd childs were dying instead of working, and from then on the whole
> system gets out of sync.
>
> What stymied me was that the whole lpr/lpd code wasn't touched in
> years (except for mandoc stuff); since I'd upgraded from 4.7 in theory
> nothing should have changed, so everything should have still been
> working -- until I stumbled over this thread.
The problem in this case was in libc.
> Now that I've already waded through that code (and if my meagre C skills
> allow it), I'll try to gently add a few lines of diagnostic messages for
> the log, so that it isn't that difficult to hunt down this kind of
> problem in the future.
>
> So in this regard, what's the established practice in this situations?
> Is code for those kinds of base daemons expected to be correct or should
> there be a degree of 'mistrust'? Or in other words: Should lpd assume
> that its children will never segfault, or should it assume that
> sometimes, something may happen and try to restart?
If there's a problem, restarting it probably won't solve it. We have
diagnostic tools like ktrace that are much more powerful than trying
to foresee everything that *could* go wrong and taking actions to
remedy in the daemon itself. Apart from that, there's the potential
problem of filling log filesystems and spawning processes like crazy.
So, no, if a daemon has a bug it should be fixed. Automatic restarting
hides the problem and potentially causes problems of it's own.
-Otto
>
> Up until recently (I've not yet taken a look at the new rc-scripting
> stuff yet) the way daemons were started suggested the former.
>
>
> Cheers,
> s//un
>
> --
> When I read about the evils of drinking, I gave up reading.
> -- Henry Youngman