Hi Eric, Actually, as we imaged it, a "generation" is created only when a new configuration is applied - when the "consistent hash" is permanently modified.
I'll open a separate thread to discuss the technical details further, including an algorithm we have in mind. I also opened TC-130 - Streamlining TC management and operations sequences <https://issues.apache.org/jira/browse/TC-130> to further monitor the issue. Would appreciate community inputs about the issue, especially discussing the PROs and CONs of the 2 different approaches: Traffic Ops orchestrated solution vs. A more flexible, traffic-router algorithm based, solution. Nir On Wed, Feb 1, 2017 at 3:33 PM, Eric Friedrich (efriedri) < [email protected]> wrote: > Hey Nir- > Interesting thought for sure. > > Would TM “health changes” (loss of connectivity, BW/loadavg too high) > change the generation count? It seems like the answer is Yes, because the > health of a cache impacts the state of the consistent hash ring. > > If so, how do these generation changes get from the Traffic Monitor to the > caches, when config changes typically come only from Traffic Ops and only > when ORT is run? > > Or maybe the generation count is just an abstraction to conceptualize the > problem space and not a literal approach? > > —Eric > > > On Feb 1, 2017, at 4:14 AM, Nir Sopher <[email protected]> wrote: > > > > Hi Eric, > > > > Formalizing the approach you suggested, one may introduce the concept of > a > > delivery-service configuration "generation" which would be an ordinal > > identifier for the a delivery service configuration. A "generation" > changes > > whenever the remap rule changes or the consistent hash mapping of content > > to server changes (e.g. due to additional server assignment). > > I such a solution, each traffic-server may hold a single generation for > > each delivery service configuration, while traffic-router may hold a > > history of generations and know which server holds which configuration > > generation. > > > > This approach introduces a considerable flexibility. It allows > > configurations to be set one after the other with no need to wait between > > them. > > It also fits well with Jeremy's suggestion for queue-update with a > delivery > > service granularity. > > > > On the other hand, complicated algorithms for solving the issue may > impose > > more risk to the network when applied, comparing to a simple > "traffic-ops" > > orchestrated solution. > > > > I'm not sure what is preferable from an operator point of view. I'm also > > not familiar with TC 3.0 configuration solution to validate he different > > approaches against. > > > > Please share your thoughts, > > Thanks, > > Nir > > > > On Tue, Jan 31, 2017 at 6:26 PM, Eric Friedrich (efriedri) < > > [email protected]> wrote: > > > >> What about an approach (apologies, still light on details), where TR > >> (perhaps still via TM) discovers the availability of delivery services > from > >> the cache itself, rather than from the CRConfig file? (Astats or its > >> remap_stats based replacement would publish its remap rules) > >> > >> Any changes to the set of servers (add/remove) or DS assignments would > not > >> require a specific step to push a changed config to the router. If a > cache > >> does not yet, or no longer has remap rules for a specific delivery > service, > >> then TR will not see that rule advertised by the cache and will not > send it > >> traffic. If adding or removing a server, TM still needs to be updated to > >> learn about the new server. > >> > >> With current configuration, theres a race condition of a few seconds > where > >> a cache removes remap rule before TM polls and TR gets health info from > TM. > >> In these few seconds, TR would erroneously send traffic to a cache > without > >> a proper remap rule. > >> > >> We could fix this by > >> a) advertising a state of the remap rule in astats to notify TR no > >> longer to send traffic on that DS for a short period before the rule is > >> actually removed - all handled inside of ORT). > >> or > >> b) prematurely removing the remap rule from astats, before the config > on > >> TS is actually updated (at the cost of missing the final few remap stats > >> numbers). This is probably unacceptable. > >> > >> I’m sure there are other variants on this, but my main goal is for TR to > >> directly learn from the caches which delivery services they actually > have > >> available. Rather than the TR learning what TO only thinks each cache > has > >> available. > >> > >> —Eric > >> > >> > >> > >> > >> > >>> On Jan 31, 2017, at 8:10 AM, Nir Sopher <[email protected]> wrote: > >>> > >>> Hi, > >>> > >>> In order to further improve the simplicity and robustness of the > control > >>> path for provisioning infrastructure and delivery services, we are > >>> currently considering ways to streamline management and operations. > >>> > >>> Currently, when applying changes in traffic-control that require the > >>> synchronization between the traffic-router and traffic-servers, the > user > >>> should be conscious to do so in a certain order. Otherwise, "black > holes" > >>> may be created. Furthermore, in some of the scenarios the user have to > >> wait > >>> and verify that the configuration reached all traffic server before he > >> may > >>> apply it to the traffic-router. > >>> > >>> We have noticed that TC-3.0 is planned to include a "Config State > >> Machine", > >>> probably dealing with the issue thoroughly. We have no further > >> information > >>> about this bullet and would appreciate any additional info. > >>> > >>> We would like to start investing in making TC operations more > streamline, > >>> robust and user-friendly. > >>> > >>> The main use-cases we would like to address at this point are: > >>> > >>> 1. Assign servers to a Delivery-Service. > >>> For this operation, the configuration must first be applied to the > >> added > >>> traffic servers, propagate, and only then applied to the > >> traffic-router. > >>> 2. Remove servers assignment to a Delivery-Service. > >>> For this operation, the configuration must first be applied to the > >>> traffic-router, and only then to the traffic-servers. > >>> 3. Add a new delivery service. > >>> This is practically a private case of servers assignment to a > >>> delivery-service. > >>> 4. Delete a delivery service. > >>> This is practically a private case of servers assignment removal from > a > >>> delivery-service. > >>> 5. Update settings that must be applied together on the traffic > servers > >>> and the router. > >>> > >>> We would like to simplify the procedure, allowing the deployment of new > >>> configuration in a single operation, instead of doing it step by step. > >>> > >>> One solution can be based on the insight that deploying such > >> configuration > >>> changes may be done by initially updating the traffic server with added > >>> functionality (e.g remap-rule), then updating the router, and lastly, > >>> removing old functionality from the traffic servers. Such a solution > can > >> be > >>> orchestrated by traffic-ops, probably without complicating other > >> components. > >>> > >>> Other solutions may provide more flexibility, but would probably > involve > >>> adding complexity to other components such as traffic-router. > >>> > >>> We would be glad to hear the community's thoughts on the matter, so we > >> can > >>> take this further. > >>> > >>> Thanks, > >>> Nir > >> > >> > >
