If the developers list is OK using apr_atomic in the server core, there
would be lots of advantages over trylock:
1. No need for child init.
2. No need for function pointers.
3. Could have a lock per cache element (I deemed it too expensive
memory-wise to have a large mutex structure per cache element).
4. It would avoid the problem of trylock not being implemented on all
platforms.
5. Fewer parameters to the function macro.
The code would be like this:
#define TIME_CACHE_FUNCTION(VALUE_SIZE, CACHE_T, CACHE_PTR,
CACHE_SIZE_POWER,\
CALC_FUNC, AFTER_READ_WORK\
)\
const apr_int64_t seconds = apr_time_sec(t);\
apr_status_t status;\
CACHE_T * const cache_element = \
&(CACHE_PTR[seconds & ((1<<CACHE_SIZE_POWER)-1)]);\
/* seconds==0 can be confused with unitialized cache; don't use cache
*/\
if (seconds==0) return CALC_FUNC(value, t);\
if (apr_atomic_cas32(&cache_element->lock, 1, 0)==0) {\
if (seconds == cache_element->key) {\
memcpy(value, &cache_element->value, VALUE_SIZE);\
apr_atomic_dec32(&cache_element->lock);\
AFTER_READ_WORK;\
return APR_SUCCESS;\
}\
if (seconds < cache_element->key) {\
apr_atomic_dec32(&cache_element->lock);\
return CALC_FUNC(value, t);\
}\
apr_atomic_dec32(&cache_element->lock);\
}\
status = CALC_FUNC(value, t);\
if (status == APR_SUCCESS) {\
if (apr_atomic_cas32(&cache_element->lock, 1, 0)==0) {\
if (seconds > cache_element->key) {\
cache_element->key = seconds;\
memcpy(&cache_element->value, value, VALUE_SIZE);\
}\
apr_atomic_dec32(&cache_element->lock);\
}\
}\
return status;
--------------------------------------------------
typedef struct {
apr_int64_t key;
apr_uint32_t lock;
apr_time_exp_t value;
} explode_time_cache_t;
TIME_CACHE(explode_time_cache_t, explode_time_lt_cache,
TIME_CACHE_SIZE_POWER)
AP_DECLARE(apr_status_t) ap_explode_recent_localtime(
apr_time_exp_t * value, apr_time_t t)
{
TIME_CACHE_FUNCTION(
sizeof(apr_time_exp_t), explode_time_cache_t, explode_time_lt_cache,
TIME_CACHE_SIZE_POWER, apr_time_exp_lt,
value->tm_usec = (apr_int32_t) apr_time_usec(t))
}
On Tue, Dec 3, 2013 at 10:53 AM, Daniel Lescohier <[email protected]
> wrote:
> I infer that you think that in this particular case, the function macro
> makes the code more readable and maintainable. I don't think one can
> define an absolute rule, it's a judgment call whether a macro increases or
> decreases clarity and maintainability. It reminds me of 'goto': most of
> the time, one shouldn't use it, but there are cases where it's better than
> the alternative.
>
>
> On Tue, Dec 3, 2013 at 10:01 AM, Jim Jagielski <[email protected]> wrote:
>
>> At first blush, the below looks very workable! On a slightly
>> off-topic topic, however, I wonder if macros are the way to
>> go. Long long ago function calls were really really expensive.
>> Does the decreased clarity (and debugging capability) really
>> offset the saved cycles? Agreed that it's not fully applicable
>> below.
>>
>> On Dec 2, 2013, at 6:13 PM, Daniel Lescohier <
>> [email protected]> wrote:
>>
>> > The current time caching implementation in util_time.c and
>> mod_log_config.c is not guaranteed to work with all compilers and cpu
>> architectures. I have some proposed code I've written, that I want to get
>> input from the mailing list on, before continuing on to compile and test it.
>> >
>> > The old implementation tries to do a wait-free algorithm, but
>> incorrectly, because the algorithm relies on total memory ordering, and the
>> implementation does not guarantee total memory ordering from the compiler
>> or CPU.
>> >
>> > My proposal is to use the standard and portable
>> apr_thread_mutex_trylock for enforcing the memory barriers. For APR's
>> Linux x86_64 implementation, this basically turns into a lock: cmpxchgl
>> instruction, and a lock: decl instruction for the unlock. Because only
>> trylock is used, not lock, the futex is never spun and arbitrated on by the
>> kernel, it's all done in userspace (if there's contention, each thread just
>> calculates the value itself instead of reading from the cache). So the
>> replacement is also a wait-free algorithm, using the standard and portable
>> apr_thread_mutex calls. Also, the old algorithm does two memcpy operations
>> when reading from the cache, while the new algorithm just does one. For
>> APR_HAS_THREADS==0 or a non-threaded mpm in use, no locking is done.
>> >
>> > The first part of the code is generic time caching code I've written in
>> util_time_caching.h. I use these macros to create five different time
>> caches:
>> >
>> > excerpt w/o boilerplate:
>> >
>> > -----------------------------------
>> > #define TIME_CACHE(CACHE_T, CACHE_PTR, CACHE_SIZE_POWER)\
>> > static CACHE_T CACHE_PTR[1<<CACHE_SIZE_POWER];
>> >
>> > #define TIME_CACHE_FUNC_PTR(FUNC_PTR, VALUE_T, CALC_FUNC)\
>> > static apr_status_t (*FUNC_PTR)(VALUE_T *, apr_time_t) = &CALC_FUNC;
>> >
>> > /* TIME_CACHE_FUNCTION macro:
>> > * The cache is implemented as a ring buffer. The key in the
>> > * cache element indicates which second the cache value is for.
>> > * We use a wait-free algorithm: if we cannot access the cache,
>> > * we just calculate the value, doing useful work, instead of
>> > * spinning trying to access the cache.
>> > * We only update the cache with newer times, because older times
>> > * should have a lower cache hit ratio, and we want to keep the
>> > * caches small to fit in the CPU L1/L2 caches.
>> > */
>> >
>> > #define TIME_CACHE_FUNCTION(FUNC_NAME, VALUE_T, VALUE_SIZE, \
>> > CACHE_T, CACHE_PTR, CACHE_SIZE_POWER, \
>> > CALC_FUNC, TRY_LOCK, RELEASE_LOCK, AFTER_READ_WORK)\
>> > static apr_status_t FUNC_NAME(VALUE_T *value, apr_time_t t)\
>> > {\
>> > const apr_int64_t seconds = apr_time_sec(t);\
>> > apr_status_t status;\
>> > CACHE_T * const cache_element = \
>> > &(CACHE_PTR[seconds & ((1<<CACHE_SIZE_POWER)-1)]);\
>> > /* seconds==0 can be confused with unitialized cache; don't use
>> cache */\
>> > if (seconds==0) return CALC_FUNC(value, t);\
>> > if (TRY_LOCK) {\
>> > if (seconds == cache_element->key) {\
>> > memcpy(value, &cache_element->value, VALUE_SIZE);\
>> > RELEASE_LOCK;\
>> > AFTER_READ_WORK;\
>> > return APR_SUCCESS;\
>> > }\
>> > if (seconds < cache_element->key) {\
>> > RELEASE_LOCK;\
>> > return CALC_FUNC(value, t);\
>> > }\
>> > RELEASE_LOCK;\
>> > }\
>> > status = CALC_FUNC(value, t);\
>> > if (status == APR_SUCCESS) {\
>> > if (TRY_LOCK) {\
>> > if (seconds > cache_element->key) {\
>> > cache_element->key = seconds;\
>> > memcpy(&cache_element->value, value, VALUE_SIZE);\
>> > }\
>> > RELEASE_LOCK;\
>> > }\
>> > }\
>> > return status;\
>> > }
>> > ----------------------------------------------
>> >
>> > Here is an example of implementing one of the caches in util_time.c:
>> >
>> > ----------------------------------------------
>> > /* Have small sized caches, to stay in CPU's L1/L2 caches.
>> > * There should be few requests older than 3 seconds, so set
>> > * cache size power to 2.
>> > */
>> > #define TIME_CACHE_SIZE_POWER 2
>> >
>> > typedef struct {
>> > apr_int64_t key;
>> > apr_time_exp_t value;
>> > } explode_time_cache_t;
>> >
>> > TIME_CACHE(explode_time_cache_t, explode_time_lt_cache,
>> > TIME_CACHE_SIZE_POWER)
>> > TIME_CACHE_FUNC_PTR(explode_time_lt_ptr, apr_time_exp_t,
>> > apr_time_exp_lt)
>> > TIME_CACHE_FUNCTION(
>> > explode_time_lt_nolocking, apr_time_exp_t, sizeof(apr_time_exp_t),
>> > explode_time_cache_t, explode_time_lt_cache, TIME_CACHE_SIZE_POWER,
>> > apr_time_exp_lt, 1,,
>> > value->tm_usec = (apr_int32_t) apr_time_usec(t))
>> > #if APR_HAS_THREADS
>> > static apr_thread_mutex_t *explode_time_lt_lock;
>> > TIME_CACHE_FUNCTION(
>> > explode_time_lt_locking, apr_time_exp_t, sizeof(apr_time_exp_t),
>> > explode_time_cache_t, explode_time_lt_cache, TIME_CACHE_SIZE_POWER,
>> > apr_time_exp_lt,
>> > apr_thread_mutex_trylock(explode_time_lt_lock)==APR_SUCCESS,
>> > apr_thread_mutex_unlock(explode_time_lt_lock),
>> > value->tm_usec = (apr_int32_t) apr_time_usec(t)
>> > #endif
>> >
>> > AP_DECLARE(apr_status_t) ap_explode_recent_localtime(apr_time_exp_t *
>> tm,
>> > apr_time_t t)
>> > {
>> > return explode_recent_lt_ptr(tm, t);
>> > }
>> > -------------------------------------------------
>> >
>> > The function pointer initially points to the uncached CALC_FUNC; only
>> after child_init is run is the function pointer replaced with the function
>> using the cache. Using the function pointer, the API and ABI of
>> ap_explode_recent_localtime stays the same.
>> >
>> > I've implemented five caches: explode local/gmt, rfc822 date, and
>> common log format local/gmt. For the rfc822 cache, it will cache the
>> formatted string, while in the old implementation, it'd only cache the
>> exploded time structure, so the old implementation formatted the string on
>> every request.
>> >
>> > Here is the child init function in util_time.c that will be referenced
>> in server/core.c:
>> >
>> > -------------------------------------------------
>> > void ap_util_time_child_init(apr_pool_t *p, server_rec *s)
>> > {
>> > #if APR_HAS_THREADS
>> > int threaded_mpm;
>> > if (ap_mpm_query(AP_MPMQ_IS_THREADED, &threaded_mpm) == APR_SUCCESS
>> > && threaded_mpm)
>> > {
>> > #define init_cache_func(mutex, func_ptr, func_name)\
>> > apr_thread_mutex_create(&mutex, APR_THREAD_MUTEX_DEFAULT, p);\
>> > func_ptr = &func_name;
>> >
>> > init_cache_func(explode_time_lt_lock, explode_time_lt_ptr,
>> > explode_time_lt_locking)
>> > init_cache_func(explode_time_gmt_lock, explode_time_gmt_ptr,
>> > explode_time_gmt_locking)
>> > init_cache_func(rfc822_date_lock, rfc822_date_ptr,
>> > rfc822_date_locking)
>> > init_cache_func(clf_time_local_lock, clf_time_local_ptr,
>> > clf_time_local_locking)
>> > init_cache_func(clf_time_gmt_lock, clf_time_gmt_ptr,
>> > clf_time_gmt_locking)
>> > }
>> > else
>> > {
>> > #endif
>> > explode_time_lt_ptr = &explode_time_lt_nolocking;
>> > explode_time_gmt_ptr = &explode_time_gmt_nolocking;
>> > rfc822_date_ptr = &rfc822_date_nolocking;
>> > clf_time_local_ptr = &clf_time_local_nolocking;
>> > clf_time_gmt_ptr = &clf_time_gmt_nolocking;
>> > #if APR_HAS_THREADS
>> > }
>> > #endif
>> > }
>> > -------------------------------------------------------------
>> >
>> > Comments?
>> >
>>
>>
>