Re: [asterisk-dev] [Code Review] 3405: Add ast_spinlock capability to lock.h

2014-03-28 Thread George Joseph

---
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/3405/
---

(Updated March 28, 2014, 10:56 a.m.)


Review request for Asterisk Developers.


Changes
---

Added link to test harness and test results.


Bugs: ASTERISK-23553
https://issues.asterisk.org/jira/browse/ASTERISK-23553


Repository: Asterisk


Description (updated)
---

Still testing but I'd like some initial feedback.

In some circumstances the atomic fetch/add/test calls are not quite flexible 
enough but a full fledged mutex or rwlock is too expensive.

Spin locks are a good solution.  They should be used only for protecting very 
short blocks of critical code such as simple compares combined with integer 
math.  Operations that may block, hold a lock, or cause the thread to give up 
it's timeslice should NEVER be attempted in a spin lock.

So,

Add the following APIs to lock.h

ast_spinlock_init
ast_spinlock_lock
ast_spinlock_trylock
ast_spinlock_unlock

Depending on the capabilities determined by configure, the following 
implementations will be chosen in order of preference...  OSX Atomics (for OSX 
only), GCC Atomics, Pthread Spinlock and as a final fallback.. Pthread Mutex.  

Performance...

Test harness: https://github.com/gtjoseph/spintest
Results are in spintest.csv

pthread adaptive mutexes are supposed to give you the best of both spin and 
mutex lock but testing shows that in this scenario, it's always worse than 
mutex.  Although it does have less kernel time than plain mutexes, I removed 
the implementation from this patch.

pthread_mutex is universally supported but shows the effect of context 
switching when there's lock contention.  It's the last resort and maybe should 
be removed in favor of a #error stating that no spinlock implementation could 
be found.

pthread_spinlock is gaining support but is not in all pthread implementations 
(OSX for one).  No kernel time at all.

gcc_atomics is also gaining support and seems to be more widely supported than 
pthread_spinlock.  No kernel time at all.

With infrequent lock contention, both gcc_atomics and pthread_spinlock are 
comparable to ast_atomic_fetchadd_int in performance. 

Although I don't have any empirical data to back me up (yet), I believe with 
osx_atomics, gcc_atomics and pthread_spinlock all major platforms are supported.

EDIT:  I forgot to mention I'm working on a GAS fallback.


Diffs
-

  branches/12/include/asterisk/lock.h 411364 
  branches/12/include/asterisk/autoconfig.h.in 411364 
  branches/12/configure.ac 411364 
  branches/12/configure UNKNOWN 

Diff: https://reviewboard.asterisk.org/r/3405/diff/


Testing
---


Thanks,

George Joseph

-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

Re: [asterisk-dev] [Code Review] 3405: Add ast_spinlock capability to lock.h

2014-03-27 Thread George Joseph

---
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/3405/
---

(Updated March 27, 2014, 2:47 p.m.)


Review request for Asterisk Developers.


Bugs: ASTERISK-23553
https://issues.asterisk.org/jira/browse/ASTERISK-23553


Repository: Asterisk


Description (updated)
---

Still testing but I'd like some initial feedback.

In some circumstances the atomic fetch/add/test calls are not quite flexible 
enough but a full fledged mutex or rwlock is too expensive.

Spin locks are a good solution.  They should be used only for protecting very 
short blocks of critical code such as simple compares combined with integer 
math.  Operations that may block, hold a lock, or cause the thread to give up 
it's timeslice should NEVER be attempted in a spin lock.

So,

Add the following APIs to lock.h

ast_spinlock_init
ast_spinlock_lock
ast_spinlock_trylock
ast_spinlock_unlock

Depending on the capabilities determined by configure, the following 
implementations will be chosen in order of preference...  OSX Atomics (for OSX 
only), GCC Atomics, Pthread Spinlock and as a final fallback.. Pthread Mutex.  

Performance...

Simple test 25,000,000 iterations of (lock, test, calculate, test, unlock) per 
thread.  
All times are milliseconds.  I have no way to test OSX Atomics.

gcc_atomics  Threads:  1  Real:   204, User:20, Sys: 0, Tot:20
pthread_spinlock Threads:  1  Real:   194, User:19, Sys: 0, Tot:19
pthread_mutexThreads:  1  Real:   464, User:46, Sys: 0, Tot:46
pthread_adaptive Threads:  1  Real:   464, User:46, Sys: 0, Tot:46

gcc_atomics  Threads:  2  Real:  1016, User:   191, Sys: 0, Tot:   191
pthread_spinlock Threads:  2  Real:  1142, User:   209, Sys: 0, Tot:   209
pthread_mutexThreads:  2  Real:  2902, User:   362, Sys:   201, Tot:   563
pthread_adaptive Threads:  2  Real:  4557, User:   896, Sys: 9, Tot:   905

gcc_atomics  Threads:  3  Real:  2273, User:   556, Sys: 0, Tot:   556
pthread_spinlock Threads:  3  Real:  2318, User:   589, Sys: 0, Tot:   589
pthread_mutexThreads:  3  Real:  6257, User:   834, Sys:   947, Tot:  1781
pthread_adaptive Threads:  3  Real: 12121, User:  3368, Sys:   177, Tot:  3545

gcc_atomics  Threads:  4  Real:  3439, User:  1025, Sys: 0, Tot:  1025
pthread_spinlock Threads:  4  Real:  3370, User:  1267, Sys: 0, Tot:  1267
pthread_mutexThreads:  4  Real: 10111, User:   851, Sys:  2901, Tot:  3752
pthread_adaptive Threads:  4  Real: 1, User:  6317, Sys:   845, Tot:  7162

pthread adaptive mutexes are supposed to give you the best of both spin and 
mutex lock but testing shows that in this scenario, it's always worse than 
mutex.  Although it does have less kernel time than plain mutexes, I removed 
the implementation from this patch.

pthread_mutex is universally supported but shows the effect of context 
switching when there's lock contention.  It's the last resort and maybe should 
be removed in favor of a #error stating that no spinlock implementation could 
be found.

pthread_spinlock is gaining support but is not in all pthread implementations 
(OSX for one).  No kernel time at all.

gcc_atomics is also gaining support and seems to be more widely supported than 
pthread_spinlock.  No kernel time at all.

With infrequent lock contention, both gcc_atomics and pthread_spinlock are 
comparable to ast_atomic_fetchadd_int in performance. 

Although I don't have any empirical data to back me up (yet), I believe with 
osx_atomics, gcc_atomics and pthread_spinlock all major platforms are supported.


Diffs
-

  branches/12/include/asterisk/lock.h 411364 
  branches/12/include/asterisk/autoconfig.h.in 411364 
  branches/12/configure.ac 411364 
  branches/12/configure UNKNOWN 

Diff: https://reviewboard.asterisk.org/r/3405/diff/


Testing
---


Thanks,

George Joseph

-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

Re: [asterisk-dev] [Code Review] 3405: Add ast_spinlock capability to lock.h

2014-03-27 Thread George Joseph

---
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/3405/
---

(Updated March 27, 2014, 2:57 p.m.)


Review request for Asterisk Developers.


Changes
---

EDIT:  I forgot to mention I'm working on a GAS fallback.


Bugs: ASTERISK-23553
https://issues.asterisk.org/jira/browse/ASTERISK-23553


Repository: Asterisk


Description (updated)
---

Still testing but I'd like some initial feedback.

In some circumstances the atomic fetch/add/test calls are not quite flexible 
enough but a full fledged mutex or rwlock is too expensive.

Spin locks are a good solution.  They should be used only for protecting very 
short blocks of critical code such as simple compares combined with integer 
math.  Operations that may block, hold a lock, or cause the thread to give up 
it's timeslice should NEVER be attempted in a spin lock.

So,

Add the following APIs to lock.h

ast_spinlock_init
ast_spinlock_lock
ast_spinlock_trylock
ast_spinlock_unlock

Depending on the capabilities determined by configure, the following 
implementations will be chosen in order of preference...  OSX Atomics (for OSX 
only), GCC Atomics, Pthread Spinlock and as a final fallback.. Pthread Mutex.  

Performance...

Simple test 25,000,000 iterations of (lock, test, calculate, test, unlock) per 
thread.  
All times are milliseconds.  I have no way to test OSX Atomics.

gcc_atomics  Threads:  1  Real:   204, User:20, Sys: 0, Tot:20
pthread_spinlock Threads:  1  Real:   194, User:19, Sys: 0, Tot:19
pthread_mutexThreads:  1  Real:   464, User:46, Sys: 0, Tot:46
pthread_adaptive Threads:  1  Real:   464, User:46, Sys: 0, Tot:46

gcc_atomics  Threads:  2  Real:  1016, User:   191, Sys: 0, Tot:   191
pthread_spinlock Threads:  2  Real:  1142, User:   209, Sys: 0, Tot:   209
pthread_mutexThreads:  2  Real:  2902, User:   362, Sys:   201, Tot:   563
pthread_adaptive Threads:  2  Real:  4557, User:   896, Sys: 9, Tot:   905

gcc_atomics  Threads:  3  Real:  2273, User:   556, Sys: 0, Tot:   556
pthread_spinlock Threads:  3  Real:  2318, User:   589, Sys: 0, Tot:   589
pthread_mutexThreads:  3  Real:  6257, User:   834, Sys:   947, Tot:  1781
pthread_adaptive Threads:  3  Real: 12121, User:  3368, Sys:   177, Tot:  3545

gcc_atomics  Threads:  4  Real:  3439, User:  1025, Sys: 0, Tot:  1025
pthread_spinlock Threads:  4  Real:  3370, User:  1267, Sys: 0, Tot:  1267
pthread_mutexThreads:  4  Real: 10111, User:   851, Sys:  2901, Tot:  3752
pthread_adaptive Threads:  4  Real: 1, User:  6317, Sys:   845, Tot:  7162

pthread adaptive mutexes are supposed to give you the best of both spin and 
mutex lock but testing shows that in this scenario, it's always worse than 
mutex.  Although it does have less kernel time than plain mutexes, I removed 
the implementation from this patch.

pthread_mutex is universally supported but shows the effect of context 
switching when there's lock contention.  It's the last resort and maybe should 
be removed in favor of a #error stating that no spinlock implementation could 
be found.

pthread_spinlock is gaining support but is not in all pthread implementations 
(OSX for one).  No kernel time at all.

gcc_atomics is also gaining support and seems to be more widely supported than 
pthread_spinlock.  No kernel time at all.

With infrequent lock contention, both gcc_atomics and pthread_spinlock are 
comparable to ast_atomic_fetchadd_int in performance. 

Although I don't have any empirical data to back me up (yet), I believe with 
osx_atomics, gcc_atomics and pthread_spinlock all major platforms are supported.

EDIT:  I forgot to mention I'm working on a GAS fallback.


Diffs
-

  branches/12/include/asterisk/lock.h 411364 
  branches/12/include/asterisk/autoconfig.h.in 411364 
  branches/12/configure.ac 411364 
  branches/12/configure UNKNOWN 

Diff: https://reviewboard.asterisk.org/r/3405/diff/


Testing
---


Thanks,

George Joseph

-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev