Hi Willy,
> Not really. Maybe we should see how the state file parser works, because
> multiple seconds to parse only 30K lines seems extremely long.
I would even say multiple minutes :)
> I'm just thinking about a few things. Probably that among these 30K servers,
> most of them are in fact tracking other ones ? In this case it could make
> sense to have an option to only dump servers which are not tracking
> others, as for a reload it can make quite some sense. Is this the case
> for you ?
What do you mean by "tracking other ones"?
What I can tell is that, for historical reasons, we named all server the same
way for each backends (ie. srvN) in the configuration template, and are using
"server templates" to add MAINT servers in the pool so that they can be added
at runtime later.
This naming thing can be changed now, but I don't know this issue could be
related or not.
What we're doing basically when getting a new event:
* if it requires to delete / update / add server(s) in one or multiple pools we
only use the runtime API and try to reuse free slots.
* if a backend/frontend has to be created / updated / deleted OR if the free
slots for a given backend is full we reload using a configuration template.
* in Jinja2 this template looks like (simplified):
backend be_foo
<options>
{%- for server in servers %}
server srv{{loop.index0}} {{server.address}}:{{server.port}} weight
{{server.weight}}{%- if server.tls %} ssl{%- endif %} check port 8500
{%- endfor %}
# Create 25 free slots, servers are numbered from N to N+25
server-template srv {{ servers|length }}-{{ servers|length + 25 }} 0.0.0.0:0
check disabled
Doing this I noticed that we have a lot of 'bad reconciliations' triggering
warning logs, such as:
[WARNING] can't find server 'srv28' with id '29' in backend with id '9' or name
'be_test'
[WARNING] backend name mismatch: from server state file: 'be_foo', from running
config 'be_bar'
I don't know if these inconsistencies (that clearly have to be fixed) can cause
additional delays.
Thanks,
Pierre