Well certainly aws has it's limitations which force you to design a very different infrastructure than you would in normal datacenter environment. IMHO this is the great thing about those limitations as you are forced to start thinking differently and end up using a set of well known and established tools to overcome those limitations. I'm talking mainly about monitoring/automation/deployment tools & centralized coordination service tools - so that you can automatically react to any change in the infrastructure.

With those tools you don't really care if some server ip changes - the ip only changes if you stop/start and start ec2 instance. If you reboot ec2 instance the ip does not change. But normally you would not really stop/start instance - this really happens then something bad happens to the instance, so that you need to reboot it, but reboot does not always works since there might be hardware problem on the server hosting this ec2 instance. So you need to stop it and then start - then you start it will start on different hardware server.

But you don't really need to all this stuff manually. If some ec2 instance is sick this is detected and propagated through the centralized coordination service to the relevant parties. Then you can decide to start a service from a failed instance on another already running ec2 instance or start new instance configure itself and start the service. The old failed instance can be just killed or suspended. (So VPC or normal datacenter will not help here - since the service will be running on different instance/server with different ip - yes you could use a floating ip in normal datacenter but you would not want to do that for every backend especially then backend are automatically added/removed. You would normally use floating ip for the frontend). Then service is active again on another/new instance - this is again propagated through the centralized coordination service. Then you automatically update needed stuff on relevant instances - like in this specific case update /etc/hosts and restart/reload haproxy. (All I wanted is to avoid haproxy restart/reload - there is no technical problem at all to do the restart). And of course all this is done automatically without human intervention.

From where I stand I see no unreliability problem with aws - the normal datacenter is just unreliable for me as aws. I don't need the normal datacenter or the VPC. The usage of those tools and the other aws features make aws much more attractive and reliable than normal datacenter.

The only really annoying thing about ec2 is that you can have only one ip per instance - this makes the HA stuff more difficult to implement and you have to design it differently that in normal datacenter. AFAIU the aws VPC would not help there too - since VPC instances still can have only one ip or/and you can't reassign it to another ec2 instance.

Alex

On 08/05/2011 11:53 PM, Hank A. Paulson wrote:
I think the problem here is that the EC2 way of doing automatic server replacement is directly opposite normal and sane patterns of doing server changes in other environments. So someone on EC2 only is thinking this is a process to hook into and use and others, like Willie, are thinking wtf? why would you do this - I don't think there will be much common ground to be found.

Did someone already mention the idea of a soft restart after some external process notices a dns/ip mapping change? Does a soft restart (-sf) re-read the hosts file or redo server dns name lookups? Presumably, your instances should not restart so frequently that simple soft restarts would become a problem - afaik.

On 8/5/11 1:42 PM, Willy Tarreau wrote:
On Fri, Aug 05, 2011 at 11:11:50PM +0300, Piavlo wrote:
It's not a matter of config option. You're supposed to run haproxy
inside a chroot. It will then not have access to the resolver.
There are simple ways to make the resolver work inside chroot without
making the chroot less secure.

I don't know any such simple way. If you're in a chroot, you have no
FS access so you can't use resolv.conf, nsswitch.conf, nor even load
the dynamic libs that are needed for that. The only thing you can do
then is to implement your own resolver and maintain a second config
for this one. This is not what I call a simple way.

  I could ask the question the other direction : why try to resolve a
name to IP when a check fails, there is no reason why a server would
have its address changed without the admin being responsible for it.
I don't agree that admin is supposed to be responsible for it directly
at all.

So you're saying that you find it normal that a *server* changes its IP
address without the admin's consent ? I'm sorry but we'll never reach
an agreement there.

Say backend server crashes/enters bad state - this is detected and new
ec2 instance is automatically spawned and autoconfigured to
replace the failed backend ec2 instance- which is optionally terminated.
The /etc/hosts of all relevent ec2 instances is auto updated (or DNS
with 60 seconds ttl is updated - by the way the 60 seconds ttl works
great withing ec2). There is no admin person involved - all is done
automatically.

That's what I'm explaining from the beginning : this *process* is totally
broken and does not fit in any way in what I'd call common practices :

   - a failed server is replaced with another server with a different IP
address. It could very well have kept the same IP address. If servers in datacenters had their IP address randomly changed upon every reboot
     it would require many more men to handle them.

- you're not even shoked that something changes the /etc/hosts of all of your servers when any server crashes. That's something I would never
     accept either. Of course, the only reason for this stupidity is the
     point above.

   - on top of that the DNS is updated every 60 seconds. That means that
     any process detecting the failure faster than the DNS updates will
     act based on the old IP address and possibly never refresh it. Once
     again, this is an ugly design imposed by the first point.

I'm sorry Piavlo, but I can't accept such mechanisms. They are broken
from scratch, there is no other word. A server's admin should be the
only person who decides to change the server's address. Once you decide
to let stupid process change everything below you, you can't expect
some software to guess things for you and to automagically recover from
the mess.

Also, in your case it would not fix the issue : resolving when the
server goes down will bring you the old address, and only after
caches expires it would bring the new one.
If /etc/hosts is updated locally the is no need to wait for cache
expiration.

1) /etc/hosts is out of reach in a chroot
2) it's out of question to re-read /etc/hosts before every connection.
3) if you don't recheck before every connection, you can connect to the
    wrong place due to the time it takes to propagate changes.

And if /etc/hosts is auto updated by appropriate tool - going one more
step of restarting/reloading haproxy is not a problem at all - but this
is what I want to avoid.

If you want to avoid this mess, simply configure your servers not to
change address with the phases of the moon.

If instead for example i could send a command to haproxy control socket
to re-resolve all the names (or better just specific name) configured in
haproxy - it would be much better - as since /etc/hosts is already
updated it would resolve to correct ip address.

It could not because it's not supposed to be present in the empty chroot.

BTW afaiu adding/removing backends/frontends dynamically on the fly
through some api / socket - is not something that is ever planned to be
supported in haproxy?

At the moment it's not planned because it requires to dynamically change
limits that are set upon startup, such as the max memory and max FD number.
Maybe in the future we'll be able to start with a configurable margin to
add some servers, but that's not planned right now. Changing a server's
address by hand might be much easier to implement though, eventhough it
will obviously break some protocols (eg: RDP). But it could fit your
use case

Regards,
Willy





Reply via email to