Re: Delivery Service Origin Refactor

Eric Friedrich (efriedri) Wed, 14 Mar 2018 09:09:42 -0700

How much does distance to origin actually impact a Live delivery service? 

  Its only the first client for each segment that has to go back to the origin- 
the rest of the responses are cached, so they don’t touch the origin at all.



Whats the use case for HTTP_NO_CACHE with Client Steering? Is there a reason 
plain-old MultiSiteOrigin wouldn’t work for you on the NO_CACHE delivery 
services?

—Eric





> On Mar 14, 2018, at 11:45 AM, Rawlin Peters <[email protected]> wrote:
> 
> Yes, I'd say that's essentially the main goal - prioritizing redundant
> HTTP_LIVE/HTTP_NO_CACHE deliveryservices to have the shortest distance
> between the edge and the origin. HTTP-type deliveryservices that use
> the MID tier, though, don't really make sense for this feature, so we
> might want to limit geo-steering targets to just
> HTTP_LIVE/HTTP_NO_CACHE.
> 
> On Wed, Mar 14, 2018 at 8:56 AM, Eric Friedrich (efriedri)
> <[email protected]> wrote:
>> I understand the goals behind Client Steering Delivery Services, but I don’t 
>> fully understand the motivation behind these changes.
>> 
>> We want redundancy in our live delivery services.  Take a national channel 
>> and make two copies of it on two origins. Clients can choose DiscoveryA or 
>> DiscoveryB based on which one is working better (or at all) for them. All 
>> CGs would need both Delivery Services assigned to account for failures where 
>> caches holding either Just A or Just B might go offline.
>> 
>> OriginA and OriginB would typically be in different locations for the most 
>> redundancy.
>> 
>> If I’m a client in Boston and I ask for Discovery channel, TR should give me 
>> redirects to DiscoveryA and DiscoveryB  both in the Boston cache group.
>> 
>> Is the goal behind this feature for TR to prioritize the DS list given to 
>> the client based on how far origin A is from Boston vs. how far B is from 
>> Boston?
>> 
>> —Eric
>> 
>> 
>> 
>> 
>> 
>> 
>>> On Mar 13, 2018, at 1:42 PM, Rawlin Peters <[email protected]> wrote:
>>> 
>>> replies inline
>>> 
>>> On Mon, Mar 12, 2018 at 5:21 PM, Nir Sopher <[email protected]> wrote:
>>>> Thank you Rawlin for the clarification:)
>>> 
>>> You're welcome. Anything I can do to help :)
>>> 
>>>> 
>>>> Still, I feel like I'm missing a piece of the puzzle here.
>>>> Maybe I do no understand the relations of "origin" and "steering target"
>>>> 
>>>> As I see it the router job is to send end users to the optimal cache. It
>>>> has 2 tools for doing so: CZF and Geo
>>>> Using the CZF is preferable, as it is based on the real network topology.
>>>> Geo is a best effort solution, used when we cannot do better. It is not
>>>> necessarily optimal, and has GEO misses, but we must use it since we cannot
>>>> map all IPs.
>>> 
>>> 
>>> Yes, the client's location will be found from the CZF first, falling
>>> back to GEO upon a CZF-miss. Then the most optimal edge cachegroup is
>>> chosen for each steering target deliveryservice. Then, the resulting
>>> list of target deliveryservices will be sorted by total distance
>>> following the path from client -> edge -> origin.
>>> 
>>>> 
>>>> The cache job is to fetch the content and serve the user.
>>>> It can be optimized to bring the content from the optimal Origin. It can be
>>>> configured to do so by specifying the best origin per cache group (in ops
>>>> DB).
>>> 
>>> 
>>> This is intentionally done as a CLIENT_STEERING deliveryservice so
>>> that a smart client can make the decision to use a different
>>> deliveryservice upon failure. If this decision was made at the caching
>>> proxy level, it would end up being like an optimized version of MSO
>>> (multi-site origin) where the client only has a single URL to request
>>> and the most optimal origin of multiple origins is chosen by the
>>> caching proxy. I don't think that's a bad idea; it's just not the
>>> architecture we want for this. By doing it as client steering we can
>>> also assign weights/ordering between colocated origins and update
>>> those steering assignments at any time. We can form the steering
>>> target list very flexibly this way.
>>> 
>>> 
>>>> I might be naive here, but as the amount of cache groups is reasonable, and
>>>> their network location is much clearer the the end user location, the
>>>> mapping and configuration would be reasonable. Therefore, using sub-optimal
>>>> Geo as a tool for choosing the Origin can be avoided.
>>> 
>>> 
>>> In practice, you could set the coordinates of the Origin to that of
>>> the most optimal cachegroup, rather than assigning the Origin directly
>>> to said cachegroup. The effect would be the same I believe.
>>> 
>>>> 
>>>> I also did not understand if the suggestion is to use the client location
>>>> for choosing the origin, or the cache group location for choosing the
>>>> origin.
>>>> Using the client location for choosing the origin practically ignores the
>>>> accurate information provided by the CZF.
>>> 
>>> 
>>> It's a combination of the client location, the edge location, and the
>>> origin location (total distance from client -> edge -> origin).
>>> 
>>>> 
>>>> What am I missing?
>>>> 10x
>>>> Nir
>>>> 
>>>> On Mon, Mar 12, 2018 at 11:19 PM, Rawlin Peters <[email protected]>
>>>> wrote:
>>>> 
>>>>> Hey Nir,
>>>>> 
>>>>> I think part of the motivation for doing this in Traffic Router rather
>>>>> than the Caching Proxy is separation of concerns. TR is already
>>>>> concerned with routing a client to the best cache based upon the
>>>>> client's location, so TR is already well-equipped to make the decision
>>>>> of how Delivery Services (origins) should be prioritized based upon
>>>>> the client's location. That way the Caching Proxy (e.g. ATS) doesn't
>>>>> need to concern itself with its own location, the client's location,
>>>>> and the location of origins; it just needs to know how to get the
>>>>> origin's content and cache it. All the client needs to know is that
>>>>> they have a prioritized list of URLs to choose from; they don't need
>>>>> to be concerned about origin/edge locations because that
>>>>> prioritization will be made for them by TR.
>>>>> 
>>>>> The target DSes will have different origins primarily because they
>>>>> will be in different locations, and the origins should be
>>>>> interchangeable in terms of the content they provide because a smart
>>>>> client may fail over to any of the target DSes in a CLIENT_STEERING DS
>>>>> for the same content.
>>>>> 
>>>>> - Rawlin
>>>>> 
>>>>> On Mon, Mar 12, 2018 at 2:37 PM, Nir Sopher <[email protected]> wrote:
>>>>>> Hi Rawlin,
>>>>>> Can you please add a few word for the motivation behind basing the
>>>>> steering
>>>>>> target selection on the location of the client?
>>>>>> As the content goes through the caches, isn't it the job of the cache to
>>>>>> select the best origin for the cache?  Why the client should be the one
>>>>> to
>>>>>> take the origin location into consideration?
>>>>>> Why the target DSes have different origins in the first place? Are they
>>>>>> have different characteristics additionally to their location?
>>>>>> Thanks,
>>>>>> Nir
>>>>>> 
>>>>>> ---------- Forwarded message ----------
>>>>>> From: Rawlin Peters <[email protected]>
>>>>>> Date: Mon, Mar 12, 2018 at 9:46 PM
>>>>>> Subject: Delivery Service Origin Refactor
>>>>>> To: [email protected]
>>>>>> 
>>>>>> 
>>>>>> Hey folks,
>>>>>> 
>>>>>> As promised, this email thread will be to discuss how to best
>>>>>> associate an Origin Latitude/Longitude with a Delivery Service,
>>>>>> primarily so that steering targets can be ordered/sent to the client
>>>>>> based upon the location of those targets (i.e. the Origin), a.k.a.
>>>>>> Steering Target Geo-Ordering. This is potentially going to be a pretty
>>>>>> large change, so all your feedback/questions/concerns are appreciated.
>>>>>> 
>>>>>> Here were a handful of bad ideas I had in order to accomplish this DS
>>>>>> Origin Lat/Long association (feel free to skip to PROPOSED SOLUTION
>>>>>> below):
>>>>>> 
>>>>>> 1. Reuse the current MSO (multisite origin) backend (i.e. add the
>>>>>> origin into the servers table, give it a lat/long from its cachegroup,
>>>>>> assign the origin server to the DS)
>>>>>> Pros:
>>>>>> - reuse of existing db schema, probably wouldn't have to add any new
>>>>>> tables/columns
>>>>>> Cons:
>>>>>> - MSO configuration is already very complex
>>>>>> - for the simple case of just wanting to give an Origin a lat/long you
>>>>>> have to create a server (of which only a few fields make sense for an
>>>>>> Origin), add it to a cachegroup (only name and lat/long make sense,
>>>>>> won't use parent relationships, isn't really a "group" of origins),
>>>>>> assign it to a server profile (have to create one first, no parameters
>>>>>> are needed), and finally assign that Origin server to the delivery
>>>>>> service (did I miss anything?)
>>>>>> 
>>>>>> 2. Add Origin lat/long columns to the deliveryservice table
>>>>>> Pros:
>>>>>> - probably the most straightforward solution for Steering Target
>>>>>> Geo-Ordering given that Origin FQDN is currently a DS field.
>>>>>> Cons:
>>>>>> - doesn't work well with MSO
>>>>>> - could be confused with Default Miss Lat/Long
>>>>>> - if two different delivery services use colocated origins, the same
>>>>>> lat/long needs entered twice
>>>>>> - adds yet another column to the crowded deliveryservice table
>>>>>> 
>>>>>> 3. Add origin lat/long parameters to a Delivery Service Profile
>>>>>> Pros:
>>>>>> - Delivery Services using colocated origins could share the same profile
>>>>>> - no DB schema updates needed
>>>>>> Cons:
>>>>>> - profile parameters lack validation
>>>>>> - still doesn't support lat/long for multiple origins associated with a
>>>>> DS
>>>>>> 
>>>>>> 4. Add the lat/long to the steering target itself (i.e. where you
>>>>>> choose weight/order, you'd also enter lat/long)
>>>>>> Pros:
>>>>>> - probably the easiest/quickest solution in terms of development
>>>>>> Cons:
>>>>>> - only applies lat/long to a steering target
>>>>>> - using the same target in multiple Steering DSes means having to keep
>>>>>> the lat/long synced between them all
>>>>>> - lat/long not easily reused by other areas that may need it in the
>>>>> future
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> PROPOSED SOLUTION:
>>>>>> 
>>>>>> All of those ideas were suboptimal, which is why I think we need to:
>>>>>> 1. Split Locations out of the cachegroup table into their own table
>>>>>> with the following columns (cachegroup would have a foreign key to
>>>>>> Location):
>>>>>> - name
>>>>>> - latitude
>>>>>> - longitude
>>>>>> 
>>>>>> 2. Split Origins out of the server and deliveryservice tables into
>>>>>> their own table with the following columns:
>>>>>> - fqdn
>>>>>> - protocol (http or https)
>>>>>> - port (optional, can be inferred from protocol)
>>>>>> - location (optional FK to Location table)
>>>>>> - deliveryservice FK (if an Origin can only be associated with a
>>>>>> single DS. Might need step 3 below for many-to-many)
>>>>>> - ip_address (optional, necessary to support `use_ip_address` profile
>>>>>> parameter for using the origin's IP address rather than fqdn in
>>>>>> parent.config)
>>>>>> - ip6_address (optional, necessary because we'd have an ip_address
>>>>>> column for the same reasons)
>>>>>> - profile (optional, primarily for MSO-specific parameters - rank and
>>>>>> weight - but I could be convinced that this is unnecessary)
>>>>>> - cachegroup (optional, necessary to maintain primary/secondary
>>>>>> relationship between MID_LOC and ORG_LOC cachegroups for MSO)
>>>>>> 
>>>>>> 3. If many-to-many DSes to Origins will still be possible, create a
>>>>>> new deliveryservice_origin table to support a many-to-many
>>>>>> relationship between DSes and origins
>>>>>> - the rank/weight fields for MSO could be added here possibly, maybe
>>>>>> other things as well?
>>>>>> 
>>>>>> 4. Consider constraints in the origin and deliveryservice_origin table
>>>>>> - must fqdn alone be unique? fqdn, protocol, and port combined?
>>>>>> 
>>>>>> The process for creating a Delivery Service would change in that
>>>>>> Origins would have to be created separately and added to the delivery
>>>>>> service. However, to aid migration to the new way of doing things, our
>>>>>> UIs could keep the "Origin FQDN" field but the API backend would then
>>>>>> create a new row in the Origin table and add it to the DS. More
>>>>>> Origins could then be added (for MSO purposes) to the DS via a new API
>>>>>> endpoint. MSO configuration would change at least in how Origins are
>>>>>> assigned to a DS ("server assignments" would then just be for
>>>>>> EDGE-type servers).
>>>>>> 
>>>>>> Cachegroup creation also changes in that Locations need to be created
>>>>>> before associating them to a Cachegroup. However, our UIs could also
>>>>>> stay the same with the backend API updated to create a Location from
>>>>>> the Cachegroup request and tie it to the Cachegroup.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I know there are a lot of backend and frontend implications with these
>>>>>> changes that would still need to be worked out, but in general does
>>>>>> this proposal sound good? Questions/concerns/feedback welcome and
>>>>>> appreciated!
>>>>>> 
>>>>>> - Rawlin
>>>>> 
>>

Re: Delivery Service Origin Refactor

Reply via email to