[ 
https://issues.apache.org/jira/browse/AURORA-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171678#comment-14171678
 ] 

Isaac Councill commented on AURORA-761:
---------------------------------------

I'll start with creating a pool of records since that's easy. Clients by 
default get a list of all healthy backends for an app from consul. Clients can 
even filter on arbitrary tags provided by jobs, which could come in handy.

As for moved backends: largely they can be handled exactly as follows from 
current Announcer, with the difference being that reliance on ZK session 
timeouts to remove ephemeral znodes must be replaced with consul checks. Consul 
agents will be collocated with tasks on the mesos slaves so it's possible to do 
any kind of check, theoretically even making sure a PID exists in the local 
proc table (needs thought). TTL checks are supported, which would be analogous 
to the ZK session keepalive - in that scheme, clients would post state to 
Consul and be marked unhealthy if a post does not come within a specified 
timeout.

However, health checks are only part of the story. It's great that unhealthy 
jobs won't be served up in client requests, but we don't want thousands of 
failed backends issuing vain healthchecks after a few weeks of task movement. 
The main thing is when to actually deregister a service. A great thing about 
serversets is that deregistration happens automatically on session timeouts. I 
don't see any way to replicate that behavior (yet) with consul but I'm still 
learning.

Options I see:

the elephant) Integrate consul in the scheduler. My strong inclination is to 
avoid that.

the lol) Trick the consul backend node into deregistering itself by having it 
execute a health check script with a callback to the consul API on fatal 
conditions (e.g., missing PID, again more thought needed). I would make sure 
I'm not just ignorant of supported auto-deregistration features in Consul 
before going there.

the final process) Do consul de-registration in a final process. Not sure how 
robust that would be in the face of update/killall, but haven't played around 
with it much. Cleanest option so far.

You're absolutely welcome to talk me out of this route. Auto-configuring 
HAProxy directly from ZK would be so clean and easy, as would be writing 
non-DNS clients to get records. DNS is cool, though, and consul provides some 
pretty nice features. It also seems like it would be a network traffic win, but 
I've got to kick consul around more to find out for sure that's the case.


> Provide a proxy for generic service discovery
> ---------------------------------------------
>
>                 Key: AURORA-761
>                 URL: https://issues.apache.org/jira/browse/AURORA-761
>             Project: Aurora
>          Issue Type: Story
>          Components: Service Discovery, Usability
>            Reporter: Bill Farner
>            Priority: Minor
>
> While {{Announcer}} provides service registration, we lack a cross-cutting 
> answer for service discovery.  There are well-known libraries that will do it 
> (e.g. finagle), but we need an answer for others.  Marathon, for example, 
> provides a script called {{haproxy_marathon_bridge}} that reloads 
> configuration of HAProxy for this purpose.  We could do something similar 
> with a mixin {{Process}} that dynamically routes an inbound port to a 
> serverset path in ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to