Re: Proposal for CDN definition file based configuration management

Gelinas, Derek Tue, 11 Apr 2017 15:13:46 -0700

This isn't server based. The definition file is the same for all servers. It 
has an entire CDN's worth of data. Any server in the CDN can build configs from 
it.


With regards to remap config, we're building them based on the servers profile 
in the new API format, not the individual server. Not certain if this will be 
done with the definition file method. 

> On Apr 11, 2017, at 5:08 PM, Nir Sopher <[email protected]> wrote:
> 
> Hi,
> 
> In the below discussion, I'm leaving aside the "server's profile" scope of
> the servers configuration, and focus on Delivery Services. Personally I
> believe that a clear decoupling should be made between the 2 scopes (and if
> I understand correctly from another thread there are already steps in this
> direction).
> 
> This issue relates with few of the issues we are trying to think of when
> discussing "self-service" (allowing a non-DevOps user to independently
> manage his relevant delivery services), and specifically "delivery-service
> versioning".
> When discussing DS versioning, we basically suggest to keep for each all
> the configuration revisions, and when applying a delivery service to a
> server, to practically apply a specific version of the delivery service to
> the server. This allows simple rollback for a specific delivery service,
> better auditing of DS configuration changes and deployment, and DS changes
> testing on servers subset.
> 
> Assuming we hold versions of the delivery services configuration, should
> the file you propose really hold the entire configuration for the server?
> Or just the list of "delivery services" + "versions" for the server?
> If we go only for a list of DS+"Cfg Version", the server needs to get from
> TO the list of DSs (using the suggested file or via
> "server/:id/deliveryservice" API), and download the additional files
> related to its served DSs upon a "DS deployed configuration version"
> change. By doing so we deal with the scalability issue discussed earlier,
> as the server does not need to bring the entire configuration again when a
> single DS is changed. We further remove the need to duplicate DS
> information in the DB (no longer saving the DS configuration per serving
> cache).
> 
> 2 issues I see with what I'm suggesting here:
> 
>   1. "remap.config" is cache instance specific, for example as it may hold
>   the hostname in the remap rule.
>   This however can be changed. For example by holding the relevant "meta"
>   the suggested json and let astat glue things together.
>   Specifically for the hostname in the remap rule, we may probably
>   consider replacing the machine name with a "*".
>   2. "remap.config" is a single file that covers all delivery services, so
>   how can this file be brought from multiple DSs?
>   As the file can now be built of multiple "include" statements, directing
>   to other remap files (that can be held per DS) this is a none issue.
> 
> What do your think?
> Nir
> 
> On Tue, Apr 11, 2017 at 7:44 PM, Gelinas, Derek <[email protected]>
> wrote:
> 
>> Good questions.  Today, when servers are queued, they only check what is
>> currently in the database.  This means you need to finish your work and not
>> start additional work before the queued run has completed.  With the
>> definition files, we will only create a new definition version when work is
>> complete.  Once the snapshot is finished, work in Traffic Ops can proceed
>> while configurations are loaded without changing those configurations.
>> 
>> Tiered updates aren’t really required, but I understand how in some
>> situations they might be desired.  We could make it an optional flag to
>> enable/disable, I suppose.
>> 
>> With regards to sending an update to a specific server… That will require
>> some thought.  As envisioned that ability would go away.  There might be a
>> way to stage these changes – snapshot the config, but not make it
>> “active.”  It could then be set as active on specific hosts, tested, and
>> made active for the rest of the CDN.  I don’t see any reason it wouldn’t be
>> possible.
>> 
>> With regards to unassigning a delivery service from a cache, it’d be the
>> same method as today – unassign, snapshot CRConfig, then snapshot the cache
>> config.  I’m thinking some interface work might go a long way to making the
>> entire process more clear and accessible, too.
>> 
>> Derek
>> 
>> On 4/11/17, 10:54 AM, "Eric Friedrich (efriedri)" <[email protected]>
>> wrote:
>> 
>>    A few questions/thoughts, apologies for not in-lining:
>> 
>>    1) If we move away from individually queued updates, we give up the
>> ability to make changes and then selectively deploy them. How often do TC
>> operations teams make config changes but do not immediately queue updates.
>> (I personally think that we currently have a bit of a tricky situation
>> where queuing updates much later can push down an unknowingly large config
>> change to a cache- i.e. many new DS added/removed since last time updates
>> were queued maybe months earlier). I wouldn’t be sad to see queue updates
>> go away, but don’t want to cause hardship on operators using that feature.
>> 
>>    2) If we move away from individually queued updates, how does that
>> affect the implicit “config state machine”? Specifically, how will edges
>> know when their parents have been configured and are ready for service?
>> Today we don’t config an edge cache with a new DS unless the mid is ready
>> to handle traffic as well.
>> 
>>    3) If we move away from individually queued updates, how do we do
>> things like unassign a delivery service from a cache? Today we have to
>> snapshot CRConfig first to stop redirects to the cache before we queue the
>> update. If updates are immediately applied and snapshot is still separate,
>> how do we get TR to stop sending traffic to a cache that no longer has the
>> remap rule?
>> 
>>    4) Also along the lines of the config state machine, we never really
>> closed on if we would make any changes to the queue update/snapshot
>> CRConfig flow. If we are looking at redoing how we generate config files,
>> it would be great to have consensus on an approach (if not an
>> implementation) to remove the need to sequence queue updates and snapshot
>> CRConfig. I think the requirement here would be to have Traffic Control
>> figure out on its own when to activate/deactivate routing to a cache from
>> TR.
>> 
>>    5) I like the suggestion of cache-based config file generation.
>>      - Caches only retrieve relevant information, so scale proportional
>> to number of caches/DSs in the CDN is much better
>>      - We could modify TR/TM to use the same approach, rather than
>> snapshotting a CRConfig.
>>      - Cache/TR/TM-based config could play a greater role in config state
>> machine, rather than having Traffic Ops build static configuration ahead of
>> time.
>> 
>>    Downsides
>>      - Versioning is still possible, but more work than maintaining
>> snapshots of a config file
>>      - Have to be very careful with API changes, any breakage now impacts
>> cache updates.
>> 
>>    —Eric
>> 
>>> On Apr 10, 2017, at 9:45 PM, Gelinas, Derek <
>> [email protected]> wrote:
>>> 
>>> Thanks Rob. To your point about scalability: I think that this is
>> more scaleable than the current crconfig implementation due to the caching.
>> However that is a very valid point and one that has been considered. I've
>> started looking into the problem from that angle and hope to have some more
>> solid data soon.  I still believe that this is ultimately more scaleable
>> than current config implementation, even with the scope caching, but the
>> proof will be in the data.
>>> 
>>> Derek
>>> 
>>>> On Apr 10, 2017, at 9:23 PM, Robert Butts <[email protected]>
>> wrote:
>>>> 
>>>> I'd propose:
>>>> * Instead of storing the JSON as blob, use
>> https://www.postgresql.org/doc
>>>> s/9.2/static/datatype-json.html
>>>> * Instead of version-then-file request, use a "latest" endpoint with
>>>> `If-Modified-Since` (https://tools.ietf.org/html/
>> rfc7232#section-3.3). We
>>>> can also serve each version at endpoints, but `If-Modified-Since`
>> lets us
>>>> determine whether there's a new snapshot and get it in a single
>> request,
>>>> both efficiently and using a standard. (We should do the same for
>> the
>>>> CRConfig).
>>>> 
>>>> Also for cache-side config generation, consider
>>>> https://github.com/apache/incubator-trafficcontrol/pull/151 . It's
>> a
>>>> prototype and needs work to bring it to production, but the basic
>>>> functionality is there. Go is safer and faster to develop than
>> Perl, and
>>>> this is already encapsulated in a library, with both CLI and HTTP
>>>> microservice examples. I'm certainly willing to help bring it to
>> production.
>>>> 
>>>> 
>>>> "a single definition file for each CDN which will contain all the
>>>> information required for any server within that CDN to generate its
>> own
>>>> configs"
>>>> 
>>>> Also, long-term, that doesn't scale, nor does the CRConfig. As
>> Traffic
>>>> Control is deployed with larger and larger CDNs, the CRConfig grows
>>>> uncontrollably. It's already 5-7mb for us, which takes an
>>>> approaching-unreasonable amount of time for Traffic Monitor and
>> Router to
>>>> fetch. This isn't an immediate concern, but long-term, we need to
>> develop a
>>>> scalable solution, something that says "only give me the data
>> modified
>>>> since this timestamp".
>>>> 
>>>> Again, this isn't an immediate crisis. I only mention it now
>> because, if a
>>>> scalable solution is about the same amount of work, now sounds like
>> a good
>>>> time. If it's relevantly more work, no worries.
>>>> 
>>>> 
>>>> But otherwise, +1. We've long needed to Separate our Concerns of
>> Traffic
>>>> Ops and the cache application.
>>>> 
>>>> 
>>>> On Mon, Apr 10, 2017 at 5:05 PM, Gelinas, Derek <
>> [email protected]>
>>>> wrote:
>>>> 
>>>>> I would like to propose a new method for ATS config file
>> generation, in
>>>>> which a single definition file for each CDN which will contain all
>> the
>>>>> information required for any server within that CDN to generate
>> its own
>>>>> configs, rather than requesting them from traffic ops.  This would
>> be a
>>>>> version-controlled json file that, when generated, would be stored
>> in a new
>>>>> table in the traffic ops database as a blob type.  This will
>> satisfy
>>>>> high-availability requirements and allow several versions of the
>>>>> configuration to be retained for rollback, as well as “freezing”
>> the config
>>>>> at that moment in time.  Combined with cache support coming in
>> 2.1, this
>>>>> file would only need be generated once per traffic ops server
>> instance.
>>>>> Instead of queueing servers to update their configurations, the
>>>>> configuration would be snapshotted similar to the crconfig file and
>>>>> downloaded by each cache according to their set interval checks –
>> rather
>>>>> than performing a syncds and checking that the server has been
>> queued for
>>>>> update, the version number would simply be checked and compared
>> against the
>>>>> currently active version on the cache itself.  Should a difference
>> be found
>>>>> the server would request the definition file and begin generating
>>>>> configuration files for itself using the data in the definition
>> file.
>>>>> 
>>>>> I would like feedback from the community regarding this proposal,
>> and any
>>>>> suggestions or comments you may have.
>>>>> 
>>>>> Thanks!
>>>>> Derek
>>>>> 
>>>>> Derek Gelinas
>>>>> IPCDN Engineering
>>>>> [email protected]<mailto:Derek_Gelinas@
>> cable.comcast.com>
>>>>> 603.812.5379
>>>>> 
>>>>> 
>> 
>> 
>> 
>> 
>>

Re: Proposal for CDN definition file based configuration management

Reply via email to