ywkaras commented on a change in pull request #7281:
URL: https://github.com/apache/trafficserver/pull/7281#discussion_r527264937
##########
File path: plugins/s3_auth/s3_auth.cc
##########
@@ -496,40 +598,56 @@ ConfigCache::get(const char *fname)
auto it = _cache.find(config_fname);
if (it != _cache.end()) {
- if (tv.tv_sec > (it->second.second + _ttl)) {
- // Update the cached configuration file.
- S3Config *s3 = new S3Config(false); // false == this config does not get
the continuation
-
- TSDebug(PLUGIN_NAME, "Configuration from %s is stale, reloading",
config_fname.c_str());
- it->second.second = tv.tv_sec;
- if (s3->parse_config(config_fname)) {
- it->second.first = s3;
+ unsigned update_status = it->second.update_status;
+ if (tv.tv_sec > (it->second.load_time + _ttl)) {
+ if (!(update_status & 1) &&
it->second.update_status.compare_exchange_strong(update_status, update_status +
1)) {
+ TSDebug(PLUGIN_NAME, "Configuration from %s is stale, reloading",
config_fname.c_str());
+ s3 = new S3Config(false); // false == this config does not get the
continuation
+
+ if (s3->parse_config(config_fname)) {
+ s3->set_conf_fname(fname);
+ } else {
+ // Failed the configuration parse... Set the cache response to
nullptr
+ delete s3;
+ s3 = nullptr;
+ TSAssert(!"Configuration parsing / caching failed");
+ }
+
+ delete it->second.config;
+ it->second.config = s3;
+ it->second.load_time = tv.tv_sec;
+
+ // Update is complete.
+ ++it->second.update_status;
} else {
- // Failed the configuration parse... Set the cache response to nullptr
- delete s3;
- it->second.first = nullptr;
+ // This thread lost the race with another thread that is also reloading
+ // the config for this file. Wait for the other thread to finish
reloading.
+ while (it->second.update_status & 1) {
+ // Hopefully yielding will sleep the thread at least until the next
+ // scheduler interrupt, preventing a busy wait.
+ std::this_thread::yield();
+ }
+ s3 = it->second.config;
Review comment:
Unfortunately, as much as I have already tortured poor Fei about this
PR, I think there is still a remaining hypothetical race condition. Suppose a
thread executes through line 630, and is preempted at some point while the
value in s3 is still in use. While it is not running, suppose a second thread
executes line 602, with a tv.tv_sec value such that the condition for the if
statement is true. The second thread then continues on and executes line 616,
before the first thread runs again. When the first thread is run again, it
will do a use-after-free.
Since this seems extremely unlikely, I think we should go ahead and merge
this PR, and then see how/if to fix this concern.
If we were using C++20, we could simply make ConfigCache::get() return a
shared_ptr, and make the config field in _Data an atomic shared_ptr. How soon
are we going to 20?
Barring that, we can make the shared_ptr approach work by replacing
update_status with a mutex. This would have the added benefit of getting rid
of a lot tricky logic I asked Fei to put in this function. (Note that this
function never creates new map entries when the map is being accessed by
multiple threads, so we only have to lock the mutex when accessing an existing
map entry.). Since that could potential be a fair number of mutexes, but I
think they would be low-contention, I suggest using a "slimmed-down"
alternative mutex: https://github.com/apache/trafficserver/pull/7330
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]