[
https://issues.apache.org/jira/browse/AURORA-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171678#comment-14171678
]
Isaac Councill commented on AURORA-761:
---------------------------------------
I'll start with creating a pool of records since that's easy. Clients by
default get a list of all healthy backends for an app from consul. Clients can
even filter on arbitrary tags provided by jobs, which could come in handy.
As for moved backends: largely they can be handled exactly as follows from
current Announcer, with the difference being that reliance on ZK session
timeouts to remove ephemeral znodes must be replaced with consul checks. Consul
agents will be collocated with tasks on the mesos slaves so it's possible to do
any kind of check, theoretically even making sure a PID exists in the local
proc table (needs thought). TTL checks are supported, which would be analogous
to the ZK session keepalive - in that scheme, clients would post state to
Consul and be marked unhealthy if a post does not come within a specified
timeout.
However, health checks are only part of the story. It's great that unhealthy
jobs won't be served up in client requests, but we don't want thousands of
failed backends issuing vain healthchecks after a few weeks of task movement.
The main thing is when to actually deregister a service. A great thing about
serversets is that deregistration happens automatically on session timeouts. I
don't see any way to replicate that behavior (yet) with consul but I'm still
learning.
Options I see:
the elephant) Integrate consul in the scheduler. My strong inclination is to
avoid that.
the lol) Trick the consul backend node into deregistering itself by having it
execute a health check script with a callback to the consul API on fatal
conditions (e.g., missing PID, again more thought needed). I would make sure
I'm not just ignorant of supported auto-deregistration features in Consul
before going there.
the final process) Do consul de-registration in a final process. Not sure how
robust that would be in the face of update/killall, but haven't played around
with it much. Cleanest option so far.
You're absolutely welcome to talk me out of this route. Auto-configuring
HAProxy directly from ZK would be so clean and easy, as would be writing
non-DNS clients to get records. DNS is cool, though, and consul provides some
pretty nice features. It also seems like it would be a network traffic win, but
I've got to kick consul around more to find out for sure that's the case.
> Provide a proxy for generic service discovery
> ---------------------------------------------
>
> Key: AURORA-761
> URL: https://issues.apache.org/jira/browse/AURORA-761
> Project: Aurora
> Issue Type: Story
> Components: Service Discovery, Usability
> Reporter: Bill Farner
> Priority: Minor
>
> While {{Announcer}} provides service registration, we lack a cross-cutting
> answer for service discovery. There are well-known libraries that will do it
> (e.g. finagle), but we need an answer for others. Marathon, for example,
> provides a script called {{haproxy_marathon_bridge}} that reloads
> configuration of HAProxy for this purpose. We could do something similar
> with a mixin {{Process}} that dynamically routes an inbound port to a
> serverset path in ZooKeeper.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)