Continuing on my investigation I found an other interesting piece of
information:
I run haproxy and my consul environment in a docker host, through
docker-compose and I can reproduce the same issue as you.
Basically, I have a service delivered by 20 containers, and HAProxy in
docker can see only 10 of them and switches all their IPs all the time...
That said, if I run the same HAProxy binary on my laptop, pointing it's DNS
resolvers to the consul client running in my docker host, everything works
smoothly!!!

In my case, there is one thing that might happen: docker drops too big DNS
responses (UDP) and my HAProxy failover to 512 bytes only where only 10 SRV
records could stand (consul also returns A and TXT records for each SRV
response).

I tested both latest 1.8 and 1.9-dev and can report same issue in both
cases.

Could you tell me more about your environment (drop the ML if there are too
many sensitive information)

Baptiste


On Mon, Feb 12, 2018 at 9:25 AM, Baptiste <bed...@gmail.com> wrote:

> First, I confirm the following bug in consul 1.0.5:
> - start a X instances of a service
> - scale the service to X+Y (with Y > 1)
> ==> then consul crashes...
> From time to time, I also saw HAProxy getting only 10 servers from 20 for
> a given service.
>
> I'll revert to 1.0.2 for now.
>
> The order of the returned SRV records is ignored by HAProxy.
> Can you confirm the number of servers associated to the service '
> mfm-monitor-opentsdb' in consul?
> On the HAProxy box, can you run the following command and return the
> output (obfuscating the IPs and other sensible information)
>   dig +notcp @127.0.0.1 -p 8600 -t SRV _mfm-monitor-opentsdb._tcp.
> service.consul
>
> Baptiste
>
>
>
> On Mon, Feb 12, 2018 at 8:27 AM, Чепайкин Михаил <mchepay...@gmail.com>
> wrote:
>
>> Im on Consul 1.0.2.
>>
>> Why do you think this issue is about serving SRV over UDP, rather than
>> about different order of SRV or A records returned by Consul DNS with
>> consecutive requests?
>>
>> On 11 February 2018 at 18:46, Baptiste <bed...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> What consul version are you using?
>>> I'm facing the same issue in my consul lab. That said, it seems to be a
>>> bug in consul, not able to serve too many SRV records over UDP.
>>> I even triggered a consul crash (using 1.0.5 version).
>>> I'm still investigating this issue and will come back to you as soon as
>>> I have more reliable information.
>>>
>>> Note: please ensure the number of server created by server-template
>>> directive (5 in your case) is above the expected number of server available
>>> in your service.
>>>
>>> Baptiste
>>>
>>>
>>>
>>> On Thu, Feb 8, 2018 at 12:32 AM, Чепайкин Михаил <mchepay...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> I’ve changed configuration as you suggested:
>>>>
>>>> backend tsdb_backend_query
>>>>   server-template tsdb_query 5 
>>>> _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns 
>>>> inter 1000
>>>>
>>>> Logs are kinda different - backend servers now go UP and DOWN, but
>>>> seems the same - ip addresses changing in the same way:
>>>>
>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 
>>>> (18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for 
>>>> maintenance (No IP for server ). 2 active and 0 backup servers left. 0 
>>>> sessions active, 0 requeued, 0 remaining in queue." 
>>>> job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 
>>>> (18208) : tsdb_backend_query/tsdb_query1 changed its IP from 
>>>> 10.182.161.223 to 10.182.161.211 by DNS cache." job=mfm-monitor-haproxy 
>>>> pid=18208
>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 
>>>> (18208) : Server tsdb_backend_query/tsdb_query1 administratively READY 
>>>> thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 
>>>> (18208) : Server tsdb_backend_query/tsdb_query1 
>>>> ('0ab6a1d3.addr.dc1.mfmconsul') is UP/READY (resolves again)." 
>>>> job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 
>>>> (18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for 
>>>> maintenance (No IP for server ). 2 active and 0 backup servers left. 0 
>>>> sessions active, 0 requeued, 0 remaining in queue." 
>>>> job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 
>>>> (18208) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.98 
>>>> to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 
>>>> (18208) : Server tsdb_backend_query/tsdb_query3 administratively READY 
>>>> thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 
>>>> (18208) : Server tsdb_backend_query/tsdb_query3 
>>>> ('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)." 
>>>> job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 
>>>> (18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for 
>>>> maintenance (No IP for server ). 2 active and 0 backup servers left. 0 
>>>> sessions active, 0 requeued, 0 remaining in queue." 
>>>> job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 
>>>> (18208) : tsdb_backend_query/tsdb_query3 changed its IP from 
>>>> 10.182.161.223 to 10.182.161.98 by DNS cache." job=mfm-monitor-haproxy 
>>>> pid=18208
>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 
>>>> (18208) : Server tsdb_backend_query/tsdb_query3 administratively READY 
>>>> thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 
>>>> (18208) : Server tsdb_backend_query/tsdb_query3 
>>>> ('0ab6a162.addr.dc1.mfmconsul') is UP/READY (resolves again)." 
>>>> job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 
>>>> (18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for 
>>>> maintenance (No IP for server ). 2 active and 0 backup servers left. 0 
>>>> sessions active, 0 requeued, 0 remaining in queue." 
>>>> job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 
>>>> (18208) : tsdb_backend_query/tsdb_query1 changed its IP from 
>>>> 10.182.161.211 to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy 
>>>> pid=18208
>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 
>>>> (18208) : Server tsdb_backend_query/tsdb_query1 administratively READY 
>>>> thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 
>>>> (18208) : Server tsdb_backend_query/tsdb_query1 
>>>> ('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)." 
>>>> job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305 
>>>> (18208) : Server tsdb_backend_query/tsdb_query2 is going DOWN for 
>>>> maintenance (No IP for server ). 2 active and 0 backup servers left. 0 
>>>> sessions active, 0 requeued, 0 remaining in queue." 
>>>> job=mfm-monitor-haproxy pid=18208
>>>> time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305 
>>>> (18208) : tsdb_backend_query/tsdb_query2 changed its IP from 
>>>> 10.182.161.163 to 10.182.161.211 by DNS cache." job=mfm-monitor-haproxy 
>>>> pid=18208
>>>>
>>>> Any thoughts?
>>>>
>>>> On 8 February 2018 at 01:25, Baptiste <bed...@gmail.com> wrote:
>>>>
>>>> Hi
>>>>>
>>>>> You're not using SRV records and that may be the root cause of your
>>>>> issue.
>>>>> Please try something like this:
>>>>>
>>>>> backend tsdb_backend_query
>>>>>   server-template tsdb_query 5 
>>>>> _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns 
>>>>> inter 1000
>>>>>
>>>>> if "mfm-monitor-opentsdb" is your service name in consul.
>>>>>
>>>>> Baptiste
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 7, 2018 at 2:52 PM, Чепайкин Михаил <mchepay...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> I have a Consul as service discovery tool and HAProxy as load
>>>>>> balancer.
>>>>>>
>>>>>> In Consul registered a service running on a number of servers, and
>>>>>> this service can be scaled by adding and removing nodes and by moving 
>>>>>> nodes
>>>>>> from one server to another.
>>>>>>
>>>>>> Consul has DNS service which randomizes responses for services like
>>>>>> that:
>>>>>>
>>>>>> [bux] michep@bux:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
>>>>>> 10.182.161.239
>>>>>> 10.182.161.152
>>>>>> 10.182.161.240
>>>>>> 10.182.161.92
>>>>>> [bux] michep@bux:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
>>>>>> 10.182.161.92
>>>>>> 10.182.161.152
>>>>>> 10.182.161.240
>>>>>> 10.182.161.239
>>>>>>
>>>>>> In HAProxy 1.8.3 im using server-template configuration, like that:
>>>>>>
>>>>>> resolvers dns
>>>>>>   nameserver dns1 ${HAPROXY_NAMESERVER}
>>>>>>   hold valid 2s
>>>>>>
>>>>>> backend tsdb_backend_query
>>>>>>   server-template tsdb_query 5 
>>>>>> mfm-monitor-opentsdb.service.mfmconsul:4242 check resolvers dns inter 
>>>>>> 1000
>>>>>>
>>>>>> And in that case I get alot of warinings in haproxy log:
>>>>>>
>>>>>> time="2018-02-02T15:44:32+03:00" level=info msg="[WARNING] 032/154432 
>>>>>> (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 
>>>>>> 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy 
>>>>>> pid=32983
>>>>>> time="2018-02-02T15:44:42+03:00" level=info msg="[WARNING] 032/154442 
>>>>>> (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 
>>>>>> 10.182.161.239 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy 
>>>>>> pid=32983
>>>>>> time="2018-02-02T15:44:46+03:00" level=info msg="[WARNING] 032/154446 
>>>>>> (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 
>>>>>> 10.182.161.152 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy 
>>>>>> pid=32983
>>>>>> time="2018-02-02T15:44:50+03:00" level=info msg="[WARNING] 032/154450 
>>>>>> (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 
>>>>>> 10.182.161.92 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy 
>>>>>> pid=32983
>>>>>> time="2018-02-02T15:44:52+03:00" level=info msg="[WARNING] 032/154452 
>>>>>> (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 
>>>>>> 10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy 
>>>>>> pid=32983
>>>>>> time="2018-02-02T15:44:56+03:00" level=info msg="[WARNING] 032/154456 
>>>>>> (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 
>>>>>> 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy 
>>>>>> pid=32983
>>>>>> time="2018-02-02T15:45:00+03:00" level=info msg="[WARNING] 032/154500 
>>>>>> (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 
>>>>>> 10.182.161.92 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy 
>>>>>> pid=32983
>>>>>> time="2018-02-02T15:45:02+03:00" level=info msg="[WARNING] 032/154502 
>>>>>> (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 
>>>>>> 10.182.161.240 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy 
>>>>>> pid=32983
>>>>>> time="2018-02-02T15:45:04+03:00" level=info msg="[WARNING] 032/154504 
>>>>>> (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 
>>>>>> 10.182.161.152 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy 
>>>>>> pid=32983
>>>>>> time="2018-02-02T15:45:06+03:00" level=info msg="[WARNING] 032/154506 
>>>>>> (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 
>>>>>> 10.182.161.239 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy 
>>>>>> pid=32983
>>>>>> time="2018-02-02T15:45:10+03:00" level=info msg="[WARNING] 032/154510 
>>>>>> (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 
>>>>>> 10.182.161.92 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy 
>>>>>> pid=32983
>>>>>> time="2018-02-02T15:45:18+03:00" level=info msg="[WARNING] 032/154518 
>>>>>> (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 
>>>>>> 10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy 
>>>>>> pid=32983
>>>>>> time="2018-02-02T15:45:20+03:00" level=info msg="[WARNING] 032/154520 
>>>>>> (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 
>>>>>> 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy 
>>>>>> pid=32983
>>>>>>
>>>>>> This isn’t really break the service, but I think this is not quite
>>>>>> normal.
>>>>>>
>>>>>> Any advise on how to resolve this issue?
>>>>>>
>>>>>
>> --
>> Mike Chepaykin
>>
>>
>

Reply via email to