To share the solution with everyone, the problem was fixed by a
configuration update.
Mike added a "accepted_payload_size 1024" into his resolvers section.

HAProxy announces by an accepted payload of 512 bytes, which let the place
for only 3 records reported by consul.
With a payload of 1024, up to 10 servers can be reported by consul, which
is large enough for Mike.

HAProxy can announce up to 8K of accpeted DNS payload , that said, while
troubleshooting Mike's case, I found a bug where HAProxy reduces itself
automatically to 1280 bytes under some conditions.
This bug is not related to Mike's case, but deserves a fix. I'll work on it
asap.

Baptiste


On Mon, Feb 12, 2018 at 10:17 AM, Baptiste <bed...@gmail.com> wrote:

> Continuing on my investigation I found an other interesting piece of
> information:
> I run haproxy and my consul environment in a docker host, through
> docker-compose and I can reproduce the same issue as you.
> Basically, I have a service delivered by 20 containers, and HAProxy in
> docker can see only 10 of them and switches all their IPs all the time...
> That said, if I run the same HAProxy binary on my laptop, pointing it's
> DNS resolvers to the consul client running in my docker host, everything
> works smoothly!!!
>
> In my case, there is one thing that might happen: docker drops too big DNS
> responses (UDP) and my HAProxy failover to 512 bytes only where only 10 SRV
> records could stand (consul also returns A and TXT records for each SRV
> response).
>
> I tested both latest 1.8 and 1.9-dev and can report same issue in both
> cases.
>
> Could you tell me more about your environment (drop the ML if there are
> too many sensitive information)
>
> Baptiste
>
>
> On Mon, Feb 12, 2018 at 9:25 AM, Baptiste <bed...@gmail.com> wrote:
>
>> First, I confirm the following bug in consul 1.0.5:
>> - start a X instances of a service
>> - scale the service to X+Y (with Y > 1)
>> ==> then consul crashes...
>> From time to time, I also saw HAProxy getting only 10 servers from 20 for
>> a given service.
>>
>> I'll revert to 1.0.2 for now.
>>
>> The order of the returned SRV records is ignored by HAProxy.
>> Can you confirm the number of servers associated to the service '
>> mfm-monitor-opentsdb' in consul?
>> On the HAProxy box, can you run the following command and return the
>> output (obfuscating the IPs and other sensible information)
>>   dig +notcp @127.0.0.1 -p 8600 -t SRV _mfm-monitor-opentsdb._tcp.ser
>> vice.consul
>>
>> Baptiste
>>
>>
>>
>> On Mon, Feb 12, 2018 at 8:27 AM, Чепайкин Михаил <mchepay...@gmail.com>
>> wrote:
>>
>>> Im on Consul 1.0.2.
>>>
>>> Why do you think this issue is about serving SRV over UDP, rather than
>>> about different order of SRV or A records returned by Consul DNS with
>>> consecutive requests?
>>>
>>> On 11 February 2018 at 18:46, Baptiste <bed...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> What consul version are you using?
>>>> I'm facing the same issue in my consul lab. That said, it seems to be a
>>>> bug in consul, not able to serve too many SRV records over UDP.
>>>> I even triggered a consul crash (using 1.0.5 version).
>>>> I'm still investigating this issue and will come back to you as soon as
>>>> I have more reliable information.
>>>>
>>>> Note: please ensure the number of server created by server-template
>>>> directive (5 in your case) is above the expected number of server available
>>>> in your service.
>>>>
>>>> Baptiste
>>>>
>>>>
>>>>
>>>> On Thu, Feb 8, 2018 at 12:32 AM, Чепайкин Михаил <mchepay...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> I’ve changed configuration as you suggested:
>>>>>
>>>>> backend tsdb_backend_query
>>>>>   server-template tsdb_query 5 
>>>>> _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns 
>>>>> inter 1000
>>>>>
>>>>> Logs are kinda different - backend servers now go UP and DOWN, but
>>>>> seems the same - ip addresses changing in the same way:
>>>>>
>>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 
>>>>> (18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for 
>>>>> maintenance (No IP for server ). 2 active and 0 backup servers left. 0 
>>>>> sessions active, 0 requeued, 0 remaining in queue." 
>>>>> job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 
>>>>> (18208) : tsdb_backend_query/tsdb_query1 changed its IP from 
>>>>> 10.182.161.223 to 10.182.161.211 by DNS cache." job=mfm-monitor-haproxy 
>>>>> pid=18208
>>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 
>>>>> (18208) : Server tsdb_backend_query/tsdb_query1 administratively READY 
>>>>> thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 
>>>>> (18208) : Server tsdb_backend_query/tsdb_query1 
>>>>> ('0ab6a1d3.addr.dc1.mfmconsul') is UP/READY (resolves again)." 
>>>>> job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 
>>>>> (18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for 
>>>>> maintenance (No IP for server ). 2 active and 0 backup servers left. 0 
>>>>> sessions active, 0 requeued, 0 remaining in queue." 
>>>>> job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 
>>>>> (18208) : tsdb_backend_query/tsdb_query3 changed its IP from 
>>>>> 10.182.161.98 to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy 
>>>>> pid=18208
>>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 
>>>>> (18208) : Server tsdb_backend_query/tsdb_query3 administratively READY 
>>>>> thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 
>>>>> (18208) : Server tsdb_backend_query/tsdb_query3 
>>>>> ('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)." 
>>>>> job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 
>>>>> (18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for 
>>>>> maintenance (No IP for server ). 2 active and 0 backup servers left. 0 
>>>>> sessions active, 0 requeued, 0 remaining in queue." 
>>>>> job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 
>>>>> (18208) : tsdb_backend_query/tsdb_query3 changed its IP from 
>>>>> 10.182.161.223 to 10.182.161.98 by DNS cache." job=mfm-monitor-haproxy 
>>>>> pid=18208
>>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 
>>>>> (18208) : Server tsdb_backend_query/tsdb_query3 administratively READY 
>>>>> thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 
>>>>> (18208) : Server tsdb_backend_query/tsdb_query3 
>>>>> ('0ab6a162.addr.dc1.mfmconsul') is UP/READY (resolves again)." 
>>>>> job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 
>>>>> (18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for 
>>>>> maintenance (No IP for server ). 2 active and 0 backup servers left. 0 
>>>>> sessions active, 0 requeued, 0 remaining in queue." 
>>>>> job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 
>>>>> (18208) : tsdb_backend_query/tsdb_query1 changed its IP from 
>>>>> 10.182.161.211 to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy 
>>>>> pid=18208
>>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 
>>>>> (18208) : Server tsdb_backend_query/tsdb_query1 administratively READY 
>>>>> thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 
>>>>> (18208) : Server tsdb_backend_query/tsdb_query1 
>>>>> ('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)." 
>>>>> job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305 
>>>>> (18208) : Server tsdb_backend_query/tsdb_query2 is going DOWN for 
>>>>> maintenance (No IP for server ). 2 active and 0 backup servers left. 0 
>>>>> sessions active, 0 requeued, 0 remaining in queue." 
>>>>> job=mfm-monitor-haproxy pid=18208
>>>>> time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305 
>>>>> (18208) : tsdb_backend_query/tsdb_query2 changed its IP from 
>>>>> 10.182.161.163 to 10.182.161.211 by DNS cache." job=mfm-monitor-haproxy 
>>>>> pid=18208
>>>>>
>>>>> Any thoughts?
>>>>>
>>>>> On 8 February 2018 at 01:25, Baptiste <bed...@gmail.com> wrote:
>>>>>
>>>>> Hi
>>>>>>
>>>>>> You're not using SRV records and that may be the root cause of your
>>>>>> issue.
>>>>>> Please try something like this:
>>>>>>
>>>>>> backend tsdb_backend_query
>>>>>>   server-template tsdb_query 5 
>>>>>> _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns 
>>>>>> inter 1000
>>>>>>
>>>>>> if "mfm-monitor-opentsdb" is your service name in consul.
>>>>>>
>>>>>> Baptiste
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 7, 2018 at 2:52 PM, Чепайкин Михаил <mchepay...@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi!
>>>>>>>
>>>>>>> I have a Consul as service discovery tool and HAProxy as load
>>>>>>> balancer.
>>>>>>>
>>>>>>> In Consul registered a service running on a number of servers, and
>>>>>>> this service can be scaled by adding and removing nodes and by moving 
>>>>>>> nodes
>>>>>>> from one server to another.
>>>>>>>
>>>>>>> Consul has DNS service which randomizes responses for services like
>>>>>>> that:
>>>>>>>
>>>>>>> [bux] michep@bux:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
>>>>>>> 10.182.161.239
>>>>>>> 10.182.161.152
>>>>>>> 10.182.161.240
>>>>>>> 10.182.161.92
>>>>>>> [bux] michep@bux:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul
>>>>>>> 10.182.161.92
>>>>>>> 10.182.161.152
>>>>>>> 10.182.161.240
>>>>>>> 10.182.161.239
>>>>>>>
>>>>>>> In HAProxy 1.8.3 im using server-template configuration, like that:
>>>>>>>
>>>>>>> resolvers dns
>>>>>>>   nameserver dns1 ${HAPROXY_NAMESERVER}
>>>>>>>   hold valid 2s
>>>>>>>
>>>>>>> backend tsdb_backend_query
>>>>>>>   server-template tsdb_query 5 
>>>>>>> mfm-monitor-opentsdb.service.mfmconsul:4242 check resolvers dns inter 
>>>>>>> 1000
>>>>>>>
>>>>>>> And in that case I get alot of warinings in haproxy log:
>>>>>>>
>>>>>>> time="2018-02-02T15:44:32+03:00" level=info msg="[WARNING] 032/154432 
>>>>>>> (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 
>>>>>>> 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy 
>>>>>>> pid=32983
>>>>>>> time="2018-02-02T15:44:42+03:00" level=info msg="[WARNING] 032/154442 
>>>>>>> (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 
>>>>>>> 10.182.161.239 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy 
>>>>>>> pid=32983
>>>>>>> time="2018-02-02T15:44:46+03:00" level=info msg="[WARNING] 032/154446 
>>>>>>> (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 
>>>>>>> 10.182.161.152 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy 
>>>>>>> pid=32983
>>>>>>> time="2018-02-02T15:44:50+03:00" level=info msg="[WARNING] 032/154450 
>>>>>>> (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 
>>>>>>> 10.182.161.92 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy 
>>>>>>> pid=32983
>>>>>>> time="2018-02-02T15:44:52+03:00" level=info msg="[WARNING] 032/154452 
>>>>>>> (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 
>>>>>>> 10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy 
>>>>>>> pid=32983
>>>>>>> time="2018-02-02T15:44:56+03:00" level=info msg="[WARNING] 032/154456 
>>>>>>> (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 
>>>>>>> 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy 
>>>>>>> pid=32983
>>>>>>> time="2018-02-02T15:45:00+03:00" level=info msg="[WARNING] 032/154500 
>>>>>>> (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 
>>>>>>> 10.182.161.92 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy 
>>>>>>> pid=32983
>>>>>>> time="2018-02-02T15:45:02+03:00" level=info msg="[WARNING] 032/154502 
>>>>>>> (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 
>>>>>>> 10.182.161.240 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy 
>>>>>>> pid=32983
>>>>>>> time="2018-02-02T15:45:04+03:00" level=info msg="[WARNING] 032/154504 
>>>>>>> (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 
>>>>>>> 10.182.161.152 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy 
>>>>>>> pid=32983
>>>>>>> time="2018-02-02T15:45:06+03:00" level=info msg="[WARNING] 032/154506 
>>>>>>> (32983) : tsdb_backend_query/tsdb_query1 changed its IP from 
>>>>>>> 10.182.161.239 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy 
>>>>>>> pid=32983
>>>>>>> time="2018-02-02T15:45:10+03:00" level=info msg="[WARNING] 032/154510 
>>>>>>> (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 
>>>>>>> 10.182.161.92 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy 
>>>>>>> pid=32983
>>>>>>> time="2018-02-02T15:45:18+03:00" level=info msg="[WARNING] 032/154518 
>>>>>>> (32983) : tsdb_backend_query/tsdb_query3 changed its IP from 
>>>>>>> 10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy 
>>>>>>> pid=32983
>>>>>>> time="2018-02-02T15:45:20+03:00" level=info msg="[WARNING] 032/154520 
>>>>>>> (32983) : tsdb_backend_query/tsdb_query2 changed its IP from 
>>>>>>> 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy 
>>>>>>> pid=32983
>>>>>>>
>>>>>>> This isn’t really break the service, but I think this is not quite
>>>>>>> normal.
>>>>>>>
>>>>>>> Any advise on how to resolve this issue?
>>>>>>>
>>>>>>
>>> --
>>> Mike Chepaykin
>>>
>>>
>>
>

Reply via email to