Re: [PATCHES] 3 patches for DNS SRV records

Conrad Hoffmann Fri, 11 Aug 2017 06:47:58 -0700


On 08/11/2017 02:56 PM, Conrad Hoffmann wrote:
> Hi,
> 
> first of all: great to see that this is making progress! I am very excited
> about everything related to SRV records and also server-templates. I tested
> a fresh master build with these patches applied, here are my observations:
> 
> On 08/11/2017 11:10 AM, Baptiste Assmann wrote:
>> Hi All
>>
>> So, I enabled latest (brilliant) contribution from Olivier into my
>> Kubernetes cluster and I discovered it did not work as expected.
>> After digging into the issues, I found 3 bugs directly related to the
>> way SRV records must be read and processed by HAProxy.
>> It was clearly hard to spot them outside a real orchestrator :)
>>
>> Please find in attachment 3 patches to fix them.
>>
>> Please note that I might have found an other bug, that I'll dig into
>> later.
>> When "scalling in" (reducing an app footprint in kubernetes), HAProxy
>> considers some servers (pods in kubernetes) in error "no dns
>> resolution". This is normal. What is not normal is that those servers
>> never ever come back to live, even when I scale up again>
>> Note that thanks to (Salut) Fred contribution about server-templates
>> some time ago, we can do some very cool fancy configurations like the
>> one below: (I have a headless service called 'red' in my kubernetes, it
>> points to my 'red' application)
>>
>> backend red
>>   server-template red 20 _http._tcp.red.default.svc.cluster.local:8080
>> inter 1s resolvers kube check
>>
>> In one line, we can enable automatic "scalling follow-up" in HAProxy.
> 
> I tried a very similar setup, like this:
> 
>>  resolvers servicediscovery
>>    nameserver dns1 10.33.60.31:53
>>    nameserver dns2 10.33.19.32:53
>>    nameserver dns3 10.33.25.28:53
>>
>>    resolve_retries       3
>>    timeout retry         1s
>>    hold valid           10s
>>    hold obsolete         5s
>>
>>  backend testbackend
>>    server-template test 20 http.web.production.<internal-name>:80 check
> 
> This is the first time I am testing the server-template keyword at all, but
> I immediately noticed that I sometimes get a rather uneven distribution of
> pods, e.g. this (with the name resolving to 5 addresses):
> 
>> $ echo "show servers state testbackend" | \
>>    nc localhost 2305 | grep testbackend | \
>>    awk '{print $5}' | sort | uniq -c
>>      7 10.146.112.130
>>      6 10.146.148.92
>>      3 10.146.172.225
>>      4 10.146.89.208
> 
> This uses only four of the five servers, with a quite uneven distribution.
> Other attempts do you use all five servers, but the distribution still
> seems pretty uneven most of the time. Is that intentional? Is the list
> populated randomnly?
> 
> Then, nothing changed when I scaled up or down (except the health checks
> taking some serves offline), but the addresses were never updated. Is that
> the bug you mentioned, or am I doing it wrong?'


Ok, I am an idiot. I realized I forgot the `resolvers` keyword for the
`server-template` directive. So scaling and DNS updates actually work now,
which is already amazing. However, the distribution thing is still somewhat
of an issue.


> Also, as more of a side node, we do use SRV records, but not underscores
> int the names, which I realize is not very common, but also not exactly
> forbidden (as far as I understand the RFC it's more of a suggestion). Would
> be great if this could be indicated in some way in the config maybe.
> 
> And lastly, I know this isn't going to be solved on a Friday afternoon, but
> I'll let you know that our infrastructure has reached a scale where DNS
> over UDP almost never cuts it anymore (due to the amount of records
> returned), and I think many people who are turning to e.g. Kubernetes do so
> because they have to operate at such scale, so my guess is this might be
> one of the more frequently requested features at some point :)
> 
> These just as "quick" feedback, depending on the time I'll have I'll try to
> take a closer look at a few things and provide more details if possible.
> 
> Again, thanks a lot for working on this, let me know if you are interested
> in any specific details.
> 
> Thanks a lot,
> Conrad
> 

Conrad
-- 
Conrad Hoffmann
Traffic Engineer

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany

Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg |
HRB 110657B

Re: [PATCHES] 3 patches for DNS SRV records

Reply via email to