On Thu, Jun 21, 2018 at 12:29 AM, Jim Deville <jdevi...@malwarebytes.com> wrote:
> Attaching an anonymized PCAP from yesterday. The first two packets are the > request and response for 4 servers, the second pair is the request and > response for 3. The 3-server response parses successfully, and Jonathan was > able to find that the 4-server response ends up hitting here > https://github.com/haproxy/haproxy/blob/master/src/dns.c#L425. > > > I'd be happy for any workaround or explanation of what we could do > differently, and happy to help get more info, or to try out a patch in our > environment to confirm a fix if this is a bug as it seems. > > > Jim > > ------------------------------ > *From:* Jim Deville > *Sent:* Tuesday, June 19, 2018 6:00:07 PM > *To:* haproxy@formilux.org > *Cc:* Jonathan Works > *Subject:* Issue with parsing DNS from AWS > > > We have a setup with ECS and AWS's Service Discovery being load balanced > by HAProxy in order to support sticky sessions for WebSocket handshakes, > and we're working on making it more efficient by upgrading to 1.8.9 and > taking advantage of seamless reloads and DNS service discovery. We have a > solution almost working, however, we're seeing an issue during scaling when > the DNS response crosses a certain size. > > > We're using the following config (anonymized): https://gist. > github.com/jredville/523de951d5ab6b60a0d345516bcf46d4 > > What we're seeing is: > * if we bring up 3 target servers, they come up as healthy, and traffic > is routed appropriately. If we restart haproxy, it comes up healthy > * if we then scale to 4 or more servers, the 4th and additional are > never recognized, however, the first 3 stay healthy > * if we restart haproxy with 4 or more servers, no servers come up > healthy > > We've attempted to modify the init-addr setting, accepted_payload_size, > check options, and we've tried with and without a server-template and this > is the behavior we consistently see. If we run strace over haproxy, we see > it making the DNS requests but never updating the state of the servers. At > this point we're not sure if we have something wrong in config or if there > is a bug in how haproxy parses responses from AWS. Johnathan (cc'd) has > pcap's if that would be helpful as well. > > Thanks, > Jim > Hi guys, Thanks for the report and the troubleshooting already done. Something that would help me a lot, is to be able to reproduce the issue. 2 options from here, either you provide the smallest terraform script which allows to reproduce the platform or you provide me an access to a temporary platform so I could troubleshoot live. (we can carry on this conversation off list of course). Baptiste