[
https://issues.apache.org/jira/browse/TC-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063811#comment-16063811
]
ASF GitHub Bot commented on TC-401:
-----------------------------------
GitHub user elsloo opened a pull request:
https://github.com/apache/incubator-trafficcontrol/pull/702
[TC-401] Fixes a race condition related to lazy loading of CacheLocat…
…ions on a NetworkNode when state changes occur from OFFLINE<->ONLINE
within a CRConfig snapshot.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/elsloo/incubator-trafficcontrol
2.1.x_fix_network_node_lazy_loading_race
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-trafficcontrol/pull/702.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #702
----
commit 37a978e24e2d79b0441a0d3bb3ca2fe9dc6fac66
Author: Jeff Elsloo <[email protected]>
Date: 2017-06-26T21:19:38Z
[TC-401] Fixes a race condition related to lazy loading of CacheLocations
on a NetworkNode when state changes occur from OFFLINE<->ONLINE within a
CRConfig snapshot.
----
> Traffic Router Serves OFFLINE Caches
> ------------------------------------
>
> Key: TC-401
> URL: https://issues.apache.org/jira/browse/TC-401
> Project: Traffic Control
> Issue Type: Bug
> Components: Traffic Router
> Affects Versions: 2.0.0
> Reporter: Jeff Elsloo
> Fix For: 2.1.0
>
>
> We identified an issue that causes Traffic Router to serve up an {{OFFLINE}}
> cache indefinitely after a snapshot of the CRConfig. This bug will also do
> the inverse, where a cache that was previously set to {{OFFLINE}} will never
> have traffic routed to it when set back to {{ONLINE}} or {{REPORTED}}
> (referenced only as {{ONLINE}} henceforth).
> The bug is caused by {{ConfigHandler.processConfig()}} clearing the cache
> locations from the {{NetworkNode}} prior to swapping out the instance of
> {{CacheRegister}}. When the cache locations have been cleared, but the prior
> {{CacheRegister}} is still in place, a race condition can occur where the
> {{CacheLocation}} for a given cache group from the prior config will be set
> on the recently cleared {{NetworkNode}}. When this happens, the
> {{List<Cache>}} contains the prior config's list for that cache group, which
> means that any host state change from/to {{ONLINE}} or {{OFFLINE}} will not
> be reflected. This is because when transitioning to {{OFFLINE}} the {{Cache}}
> drops from the CRConfig and it will reappear when set to {{ONLINE}}. Contrast
> this with {{ONLINE}} to {{ADMIN_DOWN}}, the {{Cache}} remains in the
> CRConfig, so we are simply using the status to determine whether the cache is
> available and the software works as designed.
> This is due to the way we use lazy loading to associate network ranges within
> the CZF with {{CacheLocations}} within a given {{NetworkNode}} representing
> that section of the CZF. In {{TrafficRouter}}, during cache selection, if we
> have a hit in the coverage zone file but the {{CacheLocation}} is
> uninitialized, we obtain the {{CacheLocation}} from {{CacheRegister}} and set
> it for that specific {{NetworkNode}}. If our {{NetworkNode}} is cleared but
> our {{CacheRegister}} has yet to be swapped, we will set the {{NetworkNode}}
> to the old {{CacheLocation}} and as mentioned, which will have a reference to
> the prior {{List<Cache>}}, denying anyone the opportunity to populate that
> {{NetworkNode}} with the new {{CacheLocation}} and new {{List<Cache>}}.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)