Hi Eric,
Actually, as we imaged it, a "generation" is created only when a new
configuration is applied - when the "consistent hash" is permanently
modified.

I'll open a separate thread to discuss the technical details further,
including an algorithm we have in mind.

I also opened TC-130 - Streamlining TC management and operations sequences
<https://issues.apache.org/jira/browse/TC-130> to further monitor the issue.

Would appreciate community inputs about the issue, especially discussing
the PROs and CONs of the 2 different approaches:
Traffic Ops orchestrated solution vs. A more flexible, traffic-router
algorithm based, solution.

Nir




On Wed, Feb 1, 2017 at 3:33 PM, Eric Friedrich (efriedri) <
[email protected]> wrote:

> Hey Nir-
>   Interesting thought for sure.
>
> Would TM “health changes” (loss of connectivity, BW/loadavg too high)
> change the generation count? It seems like the answer is Yes, because the
> health of a cache impacts the state of the consistent hash ring.
>
> If so, how do these generation changes get from the Traffic Monitor to the
> caches, when config changes typically come only from Traffic Ops and only
> when ORT is run?
>
> Or maybe the generation count is just an abstraction to conceptualize the
> problem space and not a literal approach?
>
> —Eric
>
> > On Feb 1, 2017, at 4:14 AM, Nir Sopher <[email protected]> wrote:
> >
> > Hi Eric,
> >
> > Formalizing the approach you suggested, one may introduce the concept of
> a
> > delivery-service configuration "generation" which would be an ordinal
> > identifier for the a delivery service configuration. A "generation"
> changes
> > whenever the remap rule changes or the consistent hash mapping of content
> > to server changes (e.g. due to additional server assignment).
> > I such a solution, each traffic-server may hold a single generation for
> > each delivery service configuration, while traffic-router may hold a
> > history of generations and know which server holds which configuration
> > generation.
> >
> > This approach introduces a considerable flexibility. It allows
> > configurations to be set one after the other with no need to wait between
> > them.
> > It also fits well with Jeremy's suggestion for queue-update with a
> delivery
> > service granularity.
> >
> > On the other hand, complicated algorithms for solving the issue may
> impose
> > more risk to the network when applied, comparing to a simple
> "traffic-ops"
> > orchestrated solution.
> >
> > I'm not sure what is preferable from an operator point of view. I'm also
> > not familiar with TC 3.0 configuration solution to validate he different
> > approaches against.
> >
> > Please share your thoughts,
> > Thanks,
> > Nir
> >
> > On Tue, Jan 31, 2017 at 6:26 PM, Eric Friedrich (efriedri) <
> > [email protected]> wrote:
> >
> >> What about an approach (apologies, still light on details), where TR
> >> (perhaps still via TM) discovers the availability of delivery services
> from
> >> the cache itself, rather than from the CRConfig file? (Astats or its
> >> remap_stats based replacement would publish its remap rules)
> >>
> >> Any changes to the set of servers (add/remove) or DS assignments would
> not
> >> require a specific step to push a changed config to the router. If a
> cache
> >> does not yet, or no longer has remap rules for a specific delivery
> service,
> >> then TR will not see that rule advertised by the cache and will not
> send it
> >> traffic. If adding or removing a server, TM still needs to be updated to
> >> learn about the new server.
> >>
> >> With current configuration, theres a race condition of a few seconds
> where
> >> a cache removes remap rule before TM polls and TR gets health info from
> TM.
> >> In these few seconds, TR would erroneously send traffic to a cache
> without
> >> a proper remap rule.
> >>
> >> We could fix this by
> >>  a) advertising a state of the remap rule in astats to notify TR no
> >> longer to send traffic on that DS for a short period before the rule is
> >> actually removed - all handled inside of ORT).
> >>    or
> >>  b) prematurely removing the remap rule from astats, before the config
> on
> >> TS is actually updated (at the cost of missing the final few remap stats
> >> numbers). This is probably unacceptable.
> >>
> >> I’m sure there are other variants on this, but my main goal is for TR to
> >> directly learn from the caches which delivery services they actually
> have
> >> available. Rather than the TR learning what TO only thinks each cache
> has
> >> available.
> >>
> >> —Eric
> >>
> >>
> >>
> >>
> >>
> >>> On Jan 31, 2017, at 8:10 AM, Nir Sopher <[email protected]> wrote:
> >>>
> >>> Hi,
> >>>
> >>> In order to further improve the simplicity and robustness of the
> control
> >>> path for provisioning infrastructure and delivery services, we are
> >>> currently considering ways to streamline management and operations.
> >>>
> >>> Currently, when applying changes in traffic-control that require the
> >>> synchronization between the traffic-router and traffic-servers, the
> user
> >>> should be conscious to do so in a certain order. Otherwise, "black
> holes"
> >>> may be created. Furthermore, in some of the scenarios the user have to
> >> wait
> >>> and verify that the configuration reached all traffic server before he
> >> may
> >>> apply it to the traffic-router.
> >>>
> >>> We have noticed that TC-3.0 is planned to include a "Config State
> >> Machine",
> >>> probably dealing with the issue thoroughly. We have no further
> >> information
> >>> about this bullet and would appreciate any additional info.
> >>>
> >>> We would like to start investing in making TC operations more
> streamline,
> >>> robust and user-friendly.
> >>>
> >>> The main use-cases we would like to address at this point are:
> >>>
> >>>  1. Assign servers to a Delivery-Service.
> >>>  For this operation, the configuration must first be applied to the
> >> added
> >>>  traffic servers, propagate, and only then applied to the
> >> traffic-router.
> >>>  2. Remove servers assignment to a Delivery-Service.
> >>>  For this operation, the configuration must first be applied to the
> >>>  traffic-router, and only then to the traffic-servers.
> >>>  3. Add a new delivery service.
> >>>  This is practically a private case of servers assignment to a
> >>>  delivery-service.
> >>>  4. Delete a delivery service.
> >>>  This is practically a private case of servers assignment removal from
> a
> >>>  delivery-service.
> >>>  5. Update settings that must be applied together on the traffic
> servers
> >>>  and the router.
> >>>
> >>> We would like to simplify the procedure, allowing the deployment of new
> >>> configuration in a single operation, instead of doing it step by step.
> >>>
> >>> One solution can be based on the insight that deploying such
> >> configuration
> >>> changes may be done by initially updating the traffic server with added
> >>> functionality (e.g remap-rule), then updating the router, and lastly,
> >>> removing old functionality from the traffic servers. Such a solution
> can
> >> be
> >>> orchestrated by traffic-ops, probably without complicating other
> >> components.
> >>>
> >>> Other solutions may provide more flexibility, but would probably
> involve
> >>> adding complexity to other components such as traffic-router.
> >>>
> >>> We would be glad to hear the community's thoughts on the matter, so we
> >> can
> >>> take this further.
> >>>
> >>> Thanks,
> >>> Nir
> >>
> >>
>
>

Reply via email to