Re: Backup Cache Group Selection

Jeff Elsloo Thu, 30 Mar 2017 10:45:52 -0700

Yes, that's correct.
--
Thanks,
Jeff


On Thu, Mar 30, 2017 at 11:20 AM, Eric Friedrich (efriedri)
<[email protected]> wrote:
> Thanks Jeff-
>   Could I think of it as the following? Echoing back to be sure I 
> understand...
>
>  If there is a lat/long for a cache group in the CZF file, any client hit to 
> that CG should use the CZF lat/long as it client’s lat/long instead of using 
> geolocation.
>
> For the purposes of finding closest cache group, the client’s location (from 
> CZF as above or from Geolocation provider) will be compared against the 
> location of the cache’s as configuration in Traffic Op’s CG record?
>
> —Eric
>
>
>> On Mar 30, 2017, at 1:07 PM, Jeff Elsloo <[email protected]> wrote:
>>
>> It could now be considered the "average" of the location of the
>> clients within that section of the CZF, however, it should be noted
>> that the addition of the geo coordinates to the CZF is relatively new.
>> Previously we never had the ability to specify lat/long on those
>> cachegroups, and we solely relied on those specified in edgeLocations,
>> meaning that the matches had to be 1:1. Adding the coordinates allowed
>> us to cover edge cases and miss scenarios and stick to the CZF
>> whenever possible. Previously when we had no coordinates, and we had a
>> hit in the CZF but not corresponding hit within the edgeLocations
>> (health, assignments, etc), we would fall back to the Geolocation
>> provider.
>> --
>> Thanks,
>> Jeff
>>
>>
>> On Thu, Mar 30, 2017 at 5:29 AM, John Shen (weifensh)
>> <[email protected]> wrote:
>>> Thanks Jeff and Oren for the discussion. I agree now that lat/long from CZF 
>>> is the “average” location of clients, and lat/long from Ops is the location 
>>> of a certain Cache Group. So it appears to be reasonable to use them as 
>>> source and dest to calculate the distance.
>>>
>>> Thanks,
>>> John
>>>
>>>
>>> On 30/03/2017, 6:55 PM, "Oren Shemesh" <[email protected]> wrote:
>>>
>>>    Jeff, having read this conversation more than once, I believe there is a
>>>    misunderstanding regarding the ability to provide coordinates for cache
>>>    groups both in the CZF and in the TO DB.
>>>
>>>    Here is what I believe is a description which may help understanding the
>>>    current behaviour:
>>>
>>>    The coordinates specified in the CZF for a cache group are not supposed 
>>> to
>>>    be the exactly same as the coordinates in the TO DB for the same cache
>>>    group.
>>>    This is because they do not represent the location of the caches of the
>>>    group.
>>>    They represent the (average) location of clients found in the subnets
>>>    specified for this cache group.
>>>
>>>    This, I believe, explains both the behaviour of the code (Why the
>>>    coordinates from the CZF are used for the source, but the coordinates 
>>> from
>>>    the TO DB are used for the various candidate cache groups), and the fact
>>>    that there is a 'duplication'.
>>>
>>>    Is this description true ?
>>>
>>>
>>>
>>>    On Wed, Mar 29, 2017 at 7:02 PM, Jeff Elsloo <[email protected]> wrote:
>>>
>>>> The cachegroup settings in the Traffic Ops GUI end up in the
>>>> `edgeLocations` section of the CRConfig. This is the source of truth
>>>> for where caches are deployed, logically or physically. We do not
>>>> provide a means to generate a CZF in Traffic Ops, so it's up to the
>>>> end user to craft one to match what is in Traffic Ops.
>>>>
>>>> There are several cases that need to be accounted for where a hit in
>>>> the CZF does match what's in `edgeLocations`, but cannot be served
>>>> there due to cache health, delivery service health, or delivery
>>>> service assignments. The other edge case is a hit where no
>>>> `edgeLocation` exists, which again, must be accounted for. Presumably
>>>> we have higher fidelity data in our CZF than we would in our
>>>> Geolocation provider and we should use it whenever possible.
>>>>
>>>> Think about this: what if you use the same CZF for two configured
>>>> CDNs, but one of the two CDNs only has caches deployed to 50% of the
>>>> cache groups defined in the CZF. Would we want to use the Geolocation
>>>> provider in the event that our source address matches a cachegroup
>>>> that does not have any assigned caches? We would ideally have as much
>>>> granularity as possible in the CZF, then use that to inform the
>>>> decision about which cachegroup should service the request instead of
>>>> falling back to a lower fidelity datasource. This is especially true
>>>> in the case of RFC 1918 addresses that might appear in one's CZF.
>>>>
>>>> Thanks,
>>>> Jeff
>>>>
>>>>
>>>> On Wed, Mar 29, 2017 at 9:12 AM, John Shen (weifensh)
>>>> <[email protected]> wrote:
>>>>> Hi Jeff,
>>>>>
>>>>> Thank you for the detail. I am wondering why there are two sets of
>>>> lat/lang,
>>>>> i.e. one in CZF, the other is in Ops GUI Cache Group setting. To
>>>> calculate
>>>>> the closest CG when matched CG in CZF is not available, the source
>>>> lat/long
>>>>> is from mathced CZF, and the dest lat/long is from Ops setting, which
>>>> doesnt
>>>>> seem to be consistent. Is there any reason why TR has this behavior?
>>>>>
>>>>> Since there are two sets of lat/long in TR, can we just use the lat/long
>>>> all
>>>>> from Ops CG settings to get the closest, and do not care about the values
>>>>> set in CZF? At least this will avoid inconsistent config for lat/long.
>>>>>
>>>>> Thanks,
>>>>> John
>>>>>
>>>>> ---Original---
>>>>> From: "Jeff Elsloo "<[email protected]>
>>>>> Date: 2017/3/29 22:45:12
>>>>> To:
>>>>> "[email protected]"<[email protected].
>>>> apache.org>;
>>>>> Subject: Re: Backup Cache Group Selection
>>>>>
>>>>> Yes, it's expected behavior. What you're describing sounds like a
>>>>> cachegroup in the CZF without any corresponding configuration in
>>>>> Traffic Ops, or a cachegroup with configuration in Traffic Ops, but
>>>>> with no available caches (DS assignments, health, etc).
>>>>>
>>>>> Presuming we have configured geolocation coordinates within the CZF,
>>>>> we know the lat/long of the cachegroup within the CZF that contains
>>>>> the source address. We can then order our list of cachegroups by
>>>>> lat/long, then select the "next best" cache group by distance and
>>>>> availability. That will be the actual cachegroup to serve the request;
>>>>> this prevents a miss on the CZF that would normally be routed to the
>>>>> Geolocation service selected for the DS.
>>>>>
>>>>> We do have a slight gap around logging, and maybe that's part of the
>>>>> question. What we see in the log is the selected lat/long, not the
>>>>> source lat/long of the hit, so we can't easily tell when we're in this
>>>>> case by simply looking at logs. This could be an area of improvement,
>>>>> however, we'll need to be careful to not conflate the logs with
>>>>> unnecessary information. In most cases the hit is the selected
>>>>> cachegroup, so we need to be careful to not just add "source" and
>>>>> "actual" coordinates to the log because it'll be identical in most CZF
>>>>> hit cases.
>>>>>
>>>>> Thanks,
>>>>> Thanks,
>>>>> Jeff
>>>>>
>>>>>
>>>>> On Wed, Mar 29, 2017 at 7:02 AM, John Shen (weifensh)
>>>>> <[email protected]> wrote:
>>>>>> Hi Jeff,
>>>>>>
>>>>>> I have just tried the getClosestCacheLocation() logic. It appears the
>>>> CZF
>>>>>> matched lat/long does come from CZF, but the lat/long of the “closest”
>>>> Cache
>>>>>> Groups is from the configuration by Ops. This means to calculate the
>>>>>> distance from the matched CG and “closest” CG, the source lat/long is
>>>> from
>>>>>> CZF, but the dest lat/long is not from CZF but from CG settings on Ops.
>>>> Is
>>>>>> this expected behavior?
>>>>>>
>>>>>> Thanks,
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On 27/01/2017, 10:51 PM, "Jeff Elsloo" <[email protected]> wrote:
>>>>>>
>>>>>>    Steve: I don't think the patch is required, however, as Eric found,
>>>>>>    without the patch there could be some gaps depending on the
>>>> scenario.
>>>>>>    That specific scenario revolved around the "next best cache group"
>>>> not
>>>>>>    having a DS assigned, or a healthy cache with the DS assigned. In
>>>> that
>>>>>>    case, despite the hits, you would still end up falling through to
>>>> the
>>>>>>    geolocation provider. The patch addresses that.
>>>>>>
>>>>>>    Eric: The rloc field is set via the Geolocation associated with the
>>>>>>    CacheLocation, which ultimately comes from the edgeLocations section
>>>>>>    of the CRConfig. When a CZF lookup is performed inside TR, a hit
>>>>>>    returns a CacheLocation. When caches aren't available within that
>>>>>>    CacheLocation, getClosestCacheLocation() is called, and that's why
>>>> you
>>>>>>    see the lat/long of the "next best cache group" instead of the
>>>> actual
>>>>>>    hit's lat/long.
>>>>>>
>>>>>>    If we want to have granularity in this situation, we might need to
>>>> 1)
>>>>>>    create a new RestultType, such as ResultType.CZ_NEXT (or something),
>>>>>>    and/or 2) massage the log format such that we either have a the
>>>>>>    original lat/long, and new lat/long in the rloc field, or create a
>>>> new
>>>>>>    field to save one or the other, such that we log both lat/longs.
>>>>>>
>>>>>>    Thoughts? Whatever we decide should go into TC-90 so we can apply
>>>> the
>>>>>>    proposed patch and improve the logging.
>>>>>>    --
>>>>>>    Thanks,
>>>>>>    Jeff
>>>>>>
>>>>>>
>>>>>>    On Fri, Jan 27, 2017 at 7:14 AM, Eric Friedrich (efriedri)
>>>>>>    <[email protected]> wrote:
>>>>>>> The rloc field usually indicates the Geolocation IP of the client
>>>>>> (short for request location)
>>>>>>>
>>>>>>> But here it looks like rloc is reflecting the location of the CG
>>>> it
>>>>>> ultimately redirected to (response location?).
>>>>>>>
>>>>>>> I would have expected the rloc field to either
>>>>>>>   1) be blank (because we never did a lookup from geoprovider)
>>>>>>>        or
>>>>>>>   2)  to contain the coordinates of the cache group the CZF hit
>>>> on
>>>>>> (in this case us-ga-macon at 32.7261, -83.6547”)
>>>>>>>
>>>>>>> —Eric
>>>>>>>
>>>>>>>> On Jan 27, 2017, at 8:28 AM, Steve Malenfant <
>>>> [email protected]>
>>>>>> wrote:
>>>>>>>>
>>>>>>>> Jeff,
>>>>>>>>
>>>>>>>> CZF properly installed: yes
>>>>>>>> Network address or not: same behavior
>>>>>>>>
>>>>>>>> But you nailed the API one. There is no cache assigned to
>>>>>> us-ga-macon,
>>>>>>>> which is exactly what I'm testing.
>>>>>>>>
>>>>>>>> I added cache groups for my testing in the lab which I assigned a
>>>>>> few
>>>>>>>> caches to them :
>>>>>>>>
>>>>>>>> - us-ga-atlanta 34.0362 -84.3207
>>>>>>>> - us-ok-oklahomacity 35.4777 -97.5545
>>>>>>>> - us-va-nova 38.7922 -77.2136
>>>>>>>> - us-ca-sandiego 32.7205 -117.0838
>>>>>>>>
>>>>>>>> API :
>>>>>>>>
>>>>>> {"locationByGeo":{"city":"Macon","countryCode":"US","
>>>> latitude":"32.7288","postalCode":"31216","countryName":"United
>>>>>>>> States","longitude":"-83.6865"},"locationByFederation":"not
>>>>>>>> found","requestIp":"24.252.192.1","locationByCoverageZone":"not
>>>>>> found"}
>>>>>>>>
>>>>>>>> Using the X-MM-Client-IP it returned the proper cache based on
>>>> CZ,
>>>>>> it
>>>>>>>> correctly sent the request to the cache in us-ga-atlanta :
>>>>>>>> 1485522786.423 qtype=HTTP chi=24.252.192.1 url="
>>>>>>>> http://crs.cox-col-jitp2.cdn1.coxlab.net/"; cqhm=GET
>>>> cqhv=HTTP/1.1
>>>>>> rtype=CZ
>>>>>>>> rloc="34.03,-84.32" rdtl=- rerr="-" rgb="-" pssc=302 ttms=0.260
>>>>>> rurl="
>>>>>>>> http://cdn1cdedge0007.cox-col-jitp2.cdn1.coxlab.net/"; rh="-"
>>>>>>>>
>>>>>>>> I then changed the coordinate to match the us-ca-sandiego group
>>>> in
>>>>>> the CZF
>>>>>>>> and now the request is sent to the us-ca-sandiego caches :
>>>>>>>> 1485523546.345 qtype=HTTP chi=24.252.192.1 url="
>>>>>>>> http://crs.cox-col-jitp2.cdn1.coxlab.net/"; cqhm=GET
>>>> cqhv=HTTP/1.1
>>>>>> rtype=CZ
>>>>>>>> rloc="32.72,-117.08" rdtl=- rerr="-" rgb="-" pssc=302 ttms=0.206
>>>>>> rurl="
>>>>>>>> http://cdn1cdedge0001.cox-col-jitp2.cdn1.coxlab.net/"; rh="-
>>>>>>>>
>>>>>>>> I'm using 1.6.1 + patch discussed in this email. Not sure if
>>>> those
>>>>>> are
>>>>>>>> necessary but I'll need to try on unpatched version.
>>>>>>>>
>>>>>>>> Do we want to fix API to reflect CZF?
>>>>>>>>
>>>>>>>> Thanks for your help.
>>>>>>>>
>>>>>>>> Steve
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jan 26, 2017 at 4:47 PM, Jeff Elsloo
>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Dave just let me know that in this case you don't have any
>>>> caches
>>>>>>>>> assigned in us-ga-macon. I'm not sure how the API behaves at
>>>> that
>>>>>>>>> point – it likely won't follow the same "next best cache group"
>>>>>> logic,
>>>>>>>>> as it was designed as a simple lookup tool.
>>>>>>>>>
>>>>>>>>> Can you try simulating a request through Traffic Router directly
>>>>>> using
>>>>>>>>> the X-MM-Client-IP header, or fakeClientIpAddress query
>>>> parameter
>>>>>>>>> using the example IP of 24.252.192.0? After you do so, check the
>>>>>>>>> coordinates in the log entry and see if the result is a CZ hit.
>>>>>>>>> --
>>>>>>>>> Thanks,
>>>>>>>>> Jeff
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jan 26, 2017 at 2:03 PM, Jeff Elsloo
>>>>>> <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>> Are you 100% sure that the Traffic Router has loaded the
>>>> updated
>>>>>> CZF?
>>>>>>>>>> If so, what happens when you use an IP within the /20 instead
>>>> of
>>>>>> the
>>>>>>>>>> network address (.0)? I tried using a network address of a /22
>>>> on
>>>>>> a
>>>>>>>>>> 1.8 TR and it hit the CZF as expected. Ultimately what you're
>>>>>> seeing
>>>>>>>>>> is a CZF miss, unrelated to the geo coordinates.
>>>>>>>>>>
>>>>>>>>>> The underlying feature with the coordinates is to select the
>>>> next
>>>>>> best
>>>>>>>>>> cache group by proximity where healthy caches have a given
>>>>>> delivery
>>>>>>>>>> service assigned. In order to test that, you would need to
>>>> have a
>>>>>> CZF
>>>>>>>>>> hit in a cache group which doesn't have that particular
>>>> delivery
>>>>>>>>>> service assigned to any caches, or have all caches within that
>>>>>> cache
>>>>>>>>>> group with that delivery service in an unhealthy state.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> --
>>>>>>>>>> Thanks,
>>>>>>>>>> Jeff
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 25, 2017 at 1:33 PM, Steve Malenfant
>>>>>> <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>>> Jeff,
>>>>>>>>>>>
>>>>>>>>>>> I've tried this coverage zone file coordinate overwrite... I
>>>>>> might be
>>>>>>>>>>> missing something.
>>>>>>>>>>>
>>>>>>>>>>> I defined the following :
>>>>>>>>>>>
>>>>>>>>>>>       "us-ga-macon": {
>>>>>>>>>>>>           "coordinates": {
>>>>>>>>>>>>               "latitude": "32.7261",
>>>>>>>>>>>>               "longitude": "-83.6547"
>>>>>>>>>>>>           },
>>>>>>>>>>>>           "network": [
>>>>>>>>>>>>               "24.252.192.0/20",
>>>>>>>>>>>>               "68.1.20.0/22",
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Then issued the following query :
>>>>>>>>>>>
>>>>>>>>>>>> curl http://traffic_router:3333/crs/stats/ip/24.252.192.0
>>>>>>>>>>>>
>>>>>>>>>>>> {"locationByGeo":{"city":"Macon","countryCode":"US","
>>>>>>>>> latitude":"32.7288","postalCode":"31216","countryName":"United
>>>>>>>>>>>> States","longitude":"-83.6865"},"locationByFederation":"not
>>>>>>>>>>>> found","requestIp":"24.252.192.0","
>>>> locationByCoverageZone":"not
>>>>>>>>> found"}
>>>>>>>>>>>>
>>>>>>>>>>> I believe I'm expecting "locationByCoverageZone" to find
>>>>>> something...
>>>>>>>>>>>
>>>>>>>>>>> I tried on 1.6.0 and 1.6.1 (patched with the pastebin above
>>>>>> which I
>>>>>>>>> wasn't
>>>>>>>>>>> sure I was suppose to do).
>>>>>>>>>>>
>>>>>>>>>>> Would you mind giving me some light on this?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Steve
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jan 23, 2017 at 3:05 PM, Jeff Elsloo
>>>>>> <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Yes; the feature went into 1.5.x.
>>>>>>>>>>>> --
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Jeff
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jan 19, 2017 at 10:37 AM, Steve Malenfant <
>>>>>>>>> [email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> I didn't know about this which is good information. Does
>>>> that
>>>>>> work on
>>>>>>>>>>>>> Traffic Router 1.6?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jan 9, 2017 at 12:44 PM, Eric Friedrich (efriedri) <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jeff and I had a quick Slack convo, so I’ll add a followup
>>>>>> summary
>>>>>>>>> here
>>>>>>>>>>>> in
>>>>>>>>>>>>>> case anyone else is interested.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cache Group location (lat/long) is configured in Traffic
>>>> Ops
>>>>>> today
>>>>>>>>> (and
>>>>>>>>>>>> is
>>>>>>>>>>>>>> used for computing distance from Maxmind Geolocation).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You can also configure the location (lat/long) for a Cache
>>>>>> Group in
>>>>>>>>> the
>>>>>>>>>>>>>> CoverageZone file (example below).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> When this location is configured (and Jeff’s suggested
>>>> logic
>>>>>> fix
>>>>>>>>> from
>>>>>>>>>>>>>> below is applied) and all caches in the mapped cache group
>>>>>> are
>>>>>>>>>>>> unavailable,
>>>>>>>>>>>>>> TR will send a client request to the cache group that is
>>>>>> closest to
>>>>>>>>> the
>>>>>>>>>>>>>> original mapped group.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Example CZF w/ cache location
>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>> "coverageZones": {
>>>>>>>>>>>>>>   “edge-cg-1": {
>>>>>>>>>>>>>>     "network6": [
>>>>>>>>>>>>>>       ...
>>>>>>>>>>>>>>     ],
>>>>>>>>>>>>>>     "network": [
>>>>>>>>>>>>>>       ...
>>>>>>>>>>>>>>     ],
>>>>>>>>>>>>>>     "coordinates": {
>>>>>>>>>>>>>>       "longitude": “-75.3342",
>>>>>>>>>>>>>>       "latitude": “42.555"
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> —Eric
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Jan 5, 2017, at 12:06 PM, Jeff Elsloo
>>>>>> <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If we applied the proposed change, given your scenario we
>>>>>> should
>>>>>>>>> fall
>>>>>>>>>>>>>>> through to the return statement that calls
>>>>>>>>> getClosestCacheLocation().
>>>>>>>>>>>>>>> That method will order all cache groups based on their
>>>>>> lat/long
>>>>>>>>> and
>>>>>>>>>>>>>>> the lat/long of the cache group we hit on in the CZF. Once
>>>>>> the
>>>>>>>>> list is
>>>>>>>>>>>>>>> ordered, we iterate through the list until we find a cache
>>>>>> group
>>>>>>>>> that
>>>>>>>>>>>>>>> has available caches for that DS.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> BTW, the stuff on line 536 is likely to produce the exact
>>>>>> same
>>>>>>>>> result
>>>>>>>>>>>>>>> as the check that precedes it. networkNode.getLoc() will
>>>>>> return
>>>>>>>>> the
>>>>>>>>>>>>>>> string name of the cache group, so when we find the
>>>>>>>>> CacheLocation, it
>>>>>>>>>>>>>>> will be the same as what we had just checked. We could
>>>>>> probably
>>>>>>>>> get
>>>>>>>>>>>>>>> away with removing that part of the method as it's
>>>>>> redundant.
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Jeff
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Jan 4, 2017 at 11:54 AM, Eric Friedrich (efriedri)
>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>> Where would TR look outside the assigned cache group to
>>>>>> find the
>>>>>>>>> next
>>>>>>>>>>>>>> closest cache group?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Jan 4, 2017, at 11:25 AM, Eric Friedrich (efriedri) <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Jan 3, 2017, at 5:20 PM, Jeff Elsloo
>>>>>> <[email protected]
>>>>>>>>>>>> <mailto:
>>>>>>>>>>>>>> [email protected]>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hey Eric,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It sounds like the use case you're after is an RFC 1918
>>>>>> client
>>>>>>>>>>>>>>>>> associated with a cache group whose caches are all
>>>>>> unavailable
>>>>>>>>> for
>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>> reason or another. Is that correct?
>>>>>>>>>>>>>>>>> Yes, exactly.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I looked at the code a bit, and I think that we can
>>>> make a
>>>>>> minor
>>>>>>>>>>>>>>>>> change to achieve the behavior you're looking for as
>>>> long
>>>>>> as
>>>>>>>>> you're
>>>>>>>>>>>>>>>>> able to put your RFC 1918 ranges in the CZF.
>>>>>>>>>>>>>>>>> Yes, we would want those ranges in the CZF. I can’t
>>>> think
>>>>>> of any
>>>>>>>>>>>> other
>>>>>>>>>>>>>> place they would go.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> There's a small logic gap in the existing algorithm
>>>> around
>>>>>> cache
>>>>>>>>>>>>>>>>> location selection and I think if we fix that (two line
>>>>>>>>> change), we
>>>>>>>>>>>>>>>>> should be better off all around. I think the only time
>>>>>> we'd ever
>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>> to go to the geolocation provider is in the event of a
>>>>>> miss on
>>>>>>>>> the
>>>>>>>>>>>>>>>>> CZF, so as long as we have a hit there, we should find
>>>> the
>>>>>> cache
>>>>>>>>>>>> group
>>>>>>>>>>>>>>>>> closest to that hit location that has available caches.
>>>>>> This
>>>>>>>>> would
>>>>>>>>>>>>>>>>> automatically provide the "backup" cache group concept,
>>>>>> and has
>>>>>>>>> the
>>>>>>>>>>>>>>>>> added benefit of doing this selection dynamically based
>>>> on
>>>>>> the
>>>>>>>>> state
>>>>>>>>>>>>>>>>> of the CDN.
>>>>>>>>>>>>>>>>> Wow, thanks for picking up on this solution. Sounds
>>>> like a
>>>>>>>>> strong
>>>>>>>>>>>>>> possibility. I like that it can extend dynamically.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> See this to get an idea of what I mean:
>>>>>>>>> http://apaste.info/u3PQo
>>>>>>>>>>>>>>>>> https://github.com/apache/
>>>> incubator-trafficcontrol/blob/
>>>>>>>>>>>>>> 249bd7504eeb7cc43402126f3719017e2475ad33/traffic_router/
>>>>>>>>>>>>>> core/src/main/java/com/comcast/cdn/traffic_control/
>>>>>>>>>>>>>> traffic_router/core/router/TrafficRouter.java#L536
>>>>>>>>>>>>>>>>> Does this line set cacheLocation to the closest cache
>>>>>> group with
>>>>>>>>>>>>>> active caches on that DS?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> What does networkNode.getLoc() actually return?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> —Eric
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Obviously we'd need to test this to ensure we don't
>>>> break
>>>>>> other
>>>>>>>>>>>>>> functionality.
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Jeff
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Jan 3, 2017 at 10:07 AM, Eric Friedrich
>>>> (efriedri)
>>>>>>>>>>>>>>>>> <[email protected]<mailto:[email protected]>> wrote:
>>>>>>>>>>>>>>>>> If all caches in the primary cache group are
>>>> unavailable,
>>>>>> our
>>>>>>>>> goal
>>>>>>>>>>>> is
>>>>>>>>>>>>>> to provide a backup routing policy for RFC1918 clients.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> When client IP is an public Internet IP, the current
>>>>>> backup
>>>>>>>>> policy
>>>>>>>>>>>> is
>>>>>>>>>>>>>> to assign the client to the geographically closest cache
>>>>>> (Distance =
>>>>>>>>>>>>>> MaxMind Geo Lat/Long - configured CG lat/long).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> When client IP is an RFC1918 IP, the client would not
>>>> have
>>>>>> a
>>>>>>>>> maxmind
>>>>>>>>>>>>>> geo-loc, so would fall back to the DS geo-miss lat long.
>>>> We’d
>>>>>> prefer
>>>>>>>>>>>> some
>>>>>>>>>>>>>> more granular control over where these clients are routed
>>>> to,
>>>>>> rather
>>>>>>>>>>>> than a
>>>>>>>>>>>>>> per-DS setting.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So with an RFC1918 client, the lookup process would be
>>>>>> (step 3
>>>>>>>>> is
>>>>>>>>>>>> only
>>>>>>>>>>>>>> addition)
>>>>>>>>>>>>>>>>> 1) Check CZF for a subnet match (and find a match for
>>>>>> existing
>>>>>>>>> cache
>>>>>>>>>>>>>> group). Assign client to CG
>>>>>>>>>>>>>>>>> 2) Check CG for available (online and associated w/ DS)
>>>>>>>>> servers. In
>>>>>>>>>>>>>> this particular case, assume CG has no servers available to
>>>>>> route
>>>>>>>>> the
>>>>>>>>>>>>>> client to
>>>>>>>>>>>>>>>>> 3) Walk the CZF's list of backup CGs and perform the
>>>> check
>>>>>> from
>>>>>>>>> #2
>>>>>>>>>>>> for
>>>>>>>>>>>>>> each CG. Use first server that is found
>>>>>>>>>>>>>>>>> 4) Assuming no server is found in #3, perform
>>>> geo-location
>>>>>> and
>>>>>>>>> find
>>>>>>>>>>>>>> closest cache group. Use a server from the closest CG if
>>>> one
>>>>>> is
>>>>>>>>> found
>>>>>>>>>>>>>>>>> 4a) If geo-location returns null, use the DS’ default
>>>>>> geo-miss
>>>>>>>>>>>>>> location as the client location.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> —Eric
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Dec 26, 2016, at 10:01 AM, Jan van Doorn
>>>>>> <[email protected]
>>>>>>>>>>>> <mailto:
>>>>>>>>>>>>>> [email protected]>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Eric,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> How does the backup list relate to the
>>>>>> RFC1918-is-not-in-geo
>>>>>>>>>>>> problem?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> To get to a cachegroup you need to get a match in the
>>>>>> coverage
>>>>>>>>>>>> zone, I
>>>>>>>>>>>>>> would think?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Rgds,
>>>>>>>>>>>>>>>>> JvD
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Dec 22, 2016, at 12:28, Eric Friedrich (efriedri) <
>>>>>>>>>>>>>> [email protected]<mailto:[email protected]>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The current behavior of cache group selection works as
>>>>>> follows
>>>>>>>>>>>>>>>>> 1) Look for a subnet match in CZF
>>>>>>>>>>>>>>>>> 2) Use MaxMind/Neustar for GeoLocation based on client
>>>> IP.
>>>>>>>>> Choose
>>>>>>>>>>>>>> closest cache group.
>>>>>>>>>>>>>>>>> 3) Use Delivery Service Geo-Miss Lat/Long. Choose
>>>> closest
>>>>>> cache
>>>>>>>>>>>> group.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For deployments where IP addressing is primarily private
>>>>>> (say
>>>>>>>>>>>> RFC-1918
>>>>>>>>>>>>>> addresses), client IP Geo Location (#2) is not useful.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We are considering adding another field to the Coverage
>>>>>> Zone
>>>>>>>>> File
>>>>>>>>>>>> that
>>>>>>>>>>>>>> configures an ordered list of backup cache groups to try if
>>>>>> the
>>>>>>>>> primary
>>>>>>>>>>>>>> cache group does not have any available caches.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Example:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> "coverageZones": {
>>>>>>>>>>>>>>>>> "cache-group-01": {
>>>>>>>>>>>>>>>>> “backupList”: [“cache-group-02”, “cache-group-03”],
>>>>>>>>>>>>>>>>> "network6": [
>>>>>>>>>>>>>>>>> "1234:5678::\/64”,
>>>>>>>>>>>>>>>>> "1234:5679::\/64"],
>>>>>>>>>>>>>>>>> "network": [
>>>>>>>>>>>>>>>>> "192.168.8.0\/24",
>>>>>>>>>>>>>>>>> "192.168.9.0\/24”]
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This configuration could also be part of the per-cache
>>>>>> group
>>>>>>>>>>>>>> configuration, but that would give less control over which
>>>>>> clients
>>>>>>>>>>>>>> preferred which cache groups. For example, you may have
>>>> cache
>>>>>>>>> groups in
>>>>>>>>>>>> LA,
>>>>>>>>>>>>>> Chicago and NY. If the Chicago Cache group fails, you may
>>>>>> want some
>>>>>>>>> of
>>>>>>>>>>>> the
>>>>>>>>>>>>>> Chicago clients to go to LA and some to go to NY. If the
>>>>>> backup CG
>>>>>>>>>>>>>> configuration is per-cg, we would not be able to control
>>>>>> where
>>>>>>>>> clients
>>>>>>>>>>>> are
>>>>>>>>>>>>>> allocated.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Looking for opinions and comments on the above proposal,
>>>>>> this is
>>>>>>>>>>>> still
>>>>>>>>>>>>>> in idea stage.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks All!
>>>>>>>>>>>>>>>>> Eric
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>>
>>>
>>>    --
>>>
>>>    *Oren Shemesh*
>>>    Qwilt | Work: +972-72-2221637| Mobile: +972-50-2281168 | [email protected]
>>>    <[email protected]>
>>>
>>>
>

Re: Backup Cache Group Selection

Reply via email to