First, I confirm the following bug in consul 1.0.5: - start a X instances of a service - scale the service to X+Y (with Y > 1) ==> then consul crashes... >From time to time, I also saw HAProxy getting only 10 servers from 20 for a given service.
I'll revert to 1.0.2 for now. The order of the returned SRV records is ignored by HAProxy. Can you confirm the number of servers associated to the service ' mfm-monitor-opentsdb' in consul? On the HAProxy box, can you run the following command and return the output (obfuscating the IPs and other sensible information) dig +notcp @127.0.0.1 -p 8600 -t SRV _mfm-monitor-opentsdb ._tcp.service.consul Baptiste On Mon, Feb 12, 2018 at 8:27 AM, Чепайкин Михаил <[email protected]> wrote: > Im on Consul 1.0.2. > > Why do you think this issue is about serving SRV over UDP, rather than > about different order of SRV or A records returned by Consul DNS with > consecutive requests? > > On 11 February 2018 at 18:46, Baptiste <[email protected]> wrote: > >> Hi, >> >> What consul version are you using? >> I'm facing the same issue in my consul lab. That said, it seems to be a >> bug in consul, not able to serve too many SRV records over UDP. >> I even triggered a consul crash (using 1.0.5 version). >> I'm still investigating this issue and will come back to you as soon as I >> have more reliable information. >> >> Note: please ensure the number of server created by server-template >> directive (5 in your case) is above the expected number of server available >> in your service. >> >> Baptiste >> >> >> >> On Thu, Feb 8, 2018 at 12:32 AM, Чепайкин Михаил <[email protected]> >> wrote: >> >>> Hi >>> >>> I’ve changed configuration as you suggested: >>> >>> backend tsdb_backend_query >>> server-template tsdb_query 5 >>> _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns inter >>> 1000 >>> >>> Logs are kinda different - backend servers now go UP and DOWN, but seems >>> the same - ip addresses changing in the same way: >>> >>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 >>> (18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for >>> maintenance (No IP for server ). 2 active and 0 backup servers left. 0 >>> sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy >>> pid=18208 >>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 >>> (18208) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.223 >>> to 10.182.161.211 by DNS cache." job=mfm-monitor-haproxy pid=18208 >>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 >>> (18208) : Server tsdb_backend_query/tsdb_query1 administratively READY >>> thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208 >>> time="2018-02-08T02:12:53+03:00" level=info msg="[WARNING] 038/021253 >>> (18208) : Server tsdb_backend_query/tsdb_query1 >>> ('0ab6a1d3.addr.dc1.mfmconsul') is UP/READY (resolves again)." >>> job=mfm-monitor-haproxy pid=18208 >>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 >>> (18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for >>> maintenance (No IP for server ). 2 active and 0 backup servers left. 0 >>> sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy >>> pid=18208 >>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 >>> (18208) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.98 >>> to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy pid=18208 >>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 >>> (18208) : Server tsdb_backend_query/tsdb_query3 administratively READY >>> thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208 >>> time="2018-02-08T02:12:55+03:00" level=info msg="[WARNING] 038/021255 >>> (18208) : Server tsdb_backend_query/tsdb_query3 >>> ('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)." >>> job=mfm-monitor-haproxy pid=18208 >>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 >>> (18208) : Server tsdb_backend_query/tsdb_query3 is going DOWN for >>> maintenance (No IP for server ). 2 active and 0 backup servers left. 0 >>> sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy >>> pid=18208 >>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 >>> (18208) : tsdb_backend_query/tsdb_query3 changed its IP from 10.182.161.223 >>> to 10.182.161.98 by DNS cache." job=mfm-monitor-haproxy pid=18208 >>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 >>> (18208) : Server tsdb_backend_query/tsdb_query3 administratively READY >>> thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208 >>> time="2018-02-08T02:12:57+03:00" level=info msg="[WARNING] 038/021257 >>> (18208) : Server tsdb_backend_query/tsdb_query3 >>> ('0ab6a162.addr.dc1.mfmconsul') is UP/READY (resolves again)." >>> job=mfm-monitor-haproxy pid=18208 >>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 >>> (18208) : Server tsdb_backend_query/tsdb_query1 is going DOWN for >>> maintenance (No IP for server ). 2 active and 0 backup servers left. 0 >>> sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy >>> pid=18208 >>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 >>> (18208) : tsdb_backend_query/tsdb_query1 changed its IP from 10.182.161.211 >>> to 10.182.161.223 by DNS cache." job=mfm-monitor-haproxy pid=18208 >>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 >>> (18208) : Server tsdb_backend_query/tsdb_query1 administratively READY >>> thanks to valid DNS answer." job=mfm-monitor-haproxy pid=18208 >>> time="2018-02-08T02:13:01+03:00" level=info msg="[WARNING] 038/021301 >>> (18208) : Server tsdb_backend_query/tsdb_query1 >>> ('0ab6a1df.addr.dc1.mfmconsul') is UP/READY (resolves again)." >>> job=mfm-monitor-haproxy pid=18208 >>> time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305 >>> (18208) : Server tsdb_backend_query/tsdb_query2 is going DOWN for >>> maintenance (No IP for server ). 2 active and 0 backup servers left. 0 >>> sessions active, 0 requeued, 0 remaining in queue." job=mfm-monitor-haproxy >>> pid=18208 >>> time="2018-02-08T02:13:05+03:00" level=info msg="[WARNING] 038/021305 >>> (18208) : tsdb_backend_query/tsdb_query2 changed its IP from 10.182.161.163 >>> to 10.182.161.211 by DNS cache." job=mfm-monitor-haproxy pid=18208 >>> >>> Any thoughts? >>> >>> On 8 February 2018 at 01:25, Baptiste <[email protected]> wrote: >>> >>> Hi >>>> >>>> You're not using SRV records and that may be the root cause of your >>>> issue. >>>> Please try something like this: >>>> >>>> backend tsdb_backend_query >>>> server-template tsdb_query 5 >>>> _mfm-monitor-opentsdb._tcp.service.mfmconsul:4242 check resolvers dns >>>> inter 1000 >>>> >>>> if "mfm-monitor-opentsdb" is your service name in consul. >>>> >>>> Baptiste >>>> >>>> >>>> >>>> On Wed, Feb 7, 2018 at 2:52 PM, Чепайкин Михаил <[email protected]> >>>> wrote: >>>> >>>>> Hi! >>>>> >>>>> I have a Consul as service discovery tool and HAProxy as load balancer. >>>>> >>>>> In Consul registered a service running on a number of servers, and >>>>> this service can be scaled by adding and removing nodes and by moving >>>>> nodes >>>>> from one server to another. >>>>> >>>>> Consul has DNS service which randomizes responses for services like >>>>> that: >>>>> >>>>> [bux] michep@bux:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul >>>>> 10.182.161.239 >>>>> 10.182.161.152 >>>>> 10.182.161.240 >>>>> 10.182.161.92 >>>>> [bux] michep@bux:~$ dig +short mfm-monitor-opentsdb.service.mfmconsul >>>>> 10.182.161.92 >>>>> 10.182.161.152 >>>>> 10.182.161.240 >>>>> 10.182.161.239 >>>>> >>>>> In HAProxy 1.8.3 im using server-template configuration, like that: >>>>> >>>>> resolvers dns >>>>> nameserver dns1 ${HAPROXY_NAMESERVER} >>>>> hold valid 2s >>>>> >>>>> backend tsdb_backend_query >>>>> server-template tsdb_query 5 >>>>> mfm-monitor-opentsdb.service.mfmconsul:4242 check resolvers dns inter 1000 >>>>> >>>>> And in that case I get alot of warinings in haproxy log: >>>>> >>>>> time="2018-02-02T15:44:32+03:00" level=info msg="[WARNING] 032/154432 >>>>> (32983) : tsdb_backend_query/tsdb_query1 changed its IP from >>>>> 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy >>>>> pid=32983 >>>>> time="2018-02-02T15:44:42+03:00" level=info msg="[WARNING] 032/154442 >>>>> (32983) : tsdb_backend_query/tsdb_query1 changed its IP from >>>>> 10.182.161.239 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy >>>>> pid=32983 >>>>> time="2018-02-02T15:44:46+03:00" level=info msg="[WARNING] 032/154446 >>>>> (32983) : tsdb_backend_query/tsdb_query3 changed its IP from >>>>> 10.182.161.152 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy >>>>> pid=32983 >>>>> time="2018-02-02T15:44:50+03:00" level=info msg="[WARNING] 032/154450 >>>>> (32983) : tsdb_backend_query/tsdb_query2 changed its IP from >>>>> 10.182.161.92 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy >>>>> pid=32983 >>>>> time="2018-02-02T15:44:52+03:00" level=info msg="[WARNING] 032/154452 >>>>> (32983) : tsdb_backend_query/tsdb_query3 changed its IP from >>>>> 10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy >>>>> pid=32983 >>>>> time="2018-02-02T15:44:56+03:00" level=info msg="[WARNING] 032/154456 >>>>> (32983) : tsdb_backend_query/tsdb_query1 changed its IP from >>>>> 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy >>>>> pid=32983 >>>>> time="2018-02-02T15:45:00+03:00" level=info msg="[WARNING] 032/154500 >>>>> (32983) : tsdb_backend_query/tsdb_query3 changed its IP from >>>>> 10.182.161.92 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy >>>>> pid=32983 >>>>> time="2018-02-02T15:45:02+03:00" level=info msg="[WARNING] 032/154502 >>>>> (32983) : tsdb_backend_query/tsdb_query3 changed its IP from >>>>> 10.182.161.240 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy >>>>> pid=32983 >>>>> time="2018-02-02T15:45:04+03:00" level=info msg="[WARNING] 032/154504 >>>>> (32983) : tsdb_backend_query/tsdb_query2 changed its IP from >>>>> 10.182.161.152 to 10.182.161.240 by DNS cache." job=mfm-monitor-haproxy >>>>> pid=32983 >>>>> time="2018-02-02T15:45:06+03:00" level=info msg="[WARNING] 032/154506 >>>>> (32983) : tsdb_backend_query/tsdb_query1 changed its IP from >>>>> 10.182.161.239 to 10.182.161.152 by DNS cache." job=mfm-monitor-haproxy >>>>> pid=32983 >>>>> time="2018-02-02T15:45:10+03:00" level=info msg="[WARNING] 032/154510 >>>>> (32983) : tsdb_backend_query/tsdb_query3 changed its IP from >>>>> 10.182.161.92 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy >>>>> pid=32983 >>>>> time="2018-02-02T15:45:18+03:00" level=info msg="[WARNING] 032/154518 >>>>> (32983) : tsdb_backend_query/tsdb_query3 changed its IP from >>>>> 10.182.161.239 to 10.182.161.92 by DNS cache." job=mfm-monitor-haproxy >>>>> pid=32983 >>>>> time="2018-02-02T15:45:20+03:00" level=info msg="[WARNING] 032/154520 >>>>> (32983) : tsdb_backend_query/tsdb_query2 changed its IP from >>>>> 10.182.161.240 to 10.182.161.239 by DNS cache." job=mfm-monitor-haproxy >>>>> pid=32983 >>>>> >>>>> This isn’t really break the service, but I think this is not quite >>>>> normal. >>>>> >>>>> Any advise on how to resolve this issue? >>>>> >>>> > -- > Mike Chepaykin > >

