On 25/10/2016 01:21 πμ, Willy Tarreau wrote:
> Hi guys,
> 
> On Tue, Oct 25, 2016 at 12:42:26AM +0200, Lukas Tribus wrote:
>> Not fixing *real world issues* because we don't agree with the use-case or
>> there is a design misconception somewhere else is dangerous. We don't have
>> to support every single obscure use-case out there, that's not what I am
>> saying, but systemd is a reality; as is docker and periodic reloads.
> (...)
> 
> Thank you both for your insights. There are indeed valid points in both
> of your arguments. I too am afraid of breaking things for people who do
> really abuse, but at the same time we cannot blame the users who get
> caught by systemd lying to them. I really don't care about people who
> would reload every 2 ms to be honnest, but I'm concerned about people
> shooting themselves in the foot because they use very large configs or
> because as you say Lukas, they accidently run the command twice. This
> is something we used to deal with in the past, it's hard to lose this
> robustness. I've seen configs taking minutes to start (300k backends),
> reduced to a few seconds after the backends were moved to a binary
> tree. But these seconds remain a period of uncertainty and that's not
> nice for the user.
> 
> I think the patch I sent this evening covers both of your concerns. It's
> quite simple, relies on a process *closing* a file descriptor, which also
> covers a dying/crashing process (because I never trust any design consisting
> in saying "I promise I will tell you when I die"). And it doesn't
> significantly change things. I'm really interested in feedback on it.
> Pavlos, please honnestly tell me if it really scares you and I'd like
> to address this (even if that means doing it differently). Let's consider
> that I want something backportable into HAPEE, you know that users there
> can be very demanding regarding reliability :-)
> 

Well, I have full confidence on the quality of your code (assuming you will
polish the patch to handle errors as you mentioned :-) ) and I am willing to
test it on our environment when arrives on HAPEE. But, we will never hit the
conditions which triggers this behavior as our configuration tool for haproxy
doesn't allow to reload very often, we allow 1 reload per min (this is
configurable of course). We did that in order to address also the case of too
many live processes for a cluster of haproxies which has a lot of long-lived TCP
connections [1].


> I'm really open to suggestions. I absolutely despise systemd and each time
> I have to work on the wrapper I feel like I'm going to throw up. So for me
> working on this crap is a huge pain each time. But I'm really fed up with
> seeing people having problems in this crazy environment because one clueless
> guy decided that he knew better than all others how a daemon should reload,
> so whatever we can do to make our users' lifes easier in captivity should
> at least be considered.
> 

Have you considered to report this to systemd? May be they have a solution, I
don't know.

To sum up, go ahead with the patch as it addresses real problems of users and
you can count on me testing HAPEE in our environment.

Cheers,
Pavlos

[1] I have mentioned before that we balance rsyslog traffic from 20K clients
and every time we reload haproxy we see old processes staying alive for days.
This is because frontend/backend runs in TCP mode and rsyslog daemon on clients
doesn't terminate the connection till it is restarted, it opens 1
long-lived TCP connection against frontend. haproxy can't close the connection
on shutdown as it does with HTTP mode as it doesn't understand the protocol and
tries to play nice and graceful with the clients, I will wait for you to close
the connection.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to