Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 230 by [email protected]: timer accuracy v one shot timer v expiry time (issue_163.t)
http://code.google.com/p/memcached/issues/detail?id=230

What steps will reproduce the problem?
1. Run memcached/t/issue_163.t on a platform that has a course timer and isn't that fast - repeatedly

What is the expected output? What do you see instead?
Test should pass,

test fails sometimes with:
not ok 4 - Objects should be reclaimed
#   Failed test 'Objects should be reclaimed'
#   at t/issue_163.t line 27.
#          got: '0'
#     expected: '1'


What version of the product are you using? On what operating system?

1.7.1.1, built from source on ARM (Panda board) running Linaro 11.09 3.0.0-1404-linaro-lt-omap kernel

Please provide any additional information below.

I believe the problem here is a combination of factors with memcached's timing and the way the test is written and the courseness of the timer on the board. Bumping up the sleep from 2 to 3 seconds in the test fixes it - but there are some underlying issues.

The test stores some data with a 1 second expiry, waits 2 seconds and stores data again, testing that the slot got reclaimed after the expiry;


However:
  1) memcached's only stores a 1 second precision timer.
2) clock_handler uses a 1 shot timer, resetting it every time it's called, and hence clock_handler gets called less regularly than once a second

So I added some debug:

a 1..7
b DAG: set_current_time: current_time=2 timer=1319726830/136748
c DAG: set_current_time: current_time=2 timer=1319726830/236723
d DAG: set_current_time: current_time=2 timer=1319726830/241087
e DAG do_item_alloc fall through: current_time=2 (new)exptime=3 time=1319726830/247984
f ok 1 - stored key
g ok 2 - We should have requested some memory
h DAG: set_current_time: current_time=3 timer=1319726831/563994
i DAG do_item_alloc tail loop: current_time=3 refcount=0 exptime=3 time=1319726832/296141 j DAG do_item_alloc fall through: current_time=3 (new)exptime=0 time=1319726832/296752
k ok 3 - stored key
l not ok 4 - Objects should be reclaimed
m #   Failed test 'Objects should be reclaimed'
n #   at t/issue_163.t line 27.
o #          got: '0'
p #     expected: '1'
q ok 5 - deleted key
r DAG do_item_alloc fall through: current_time=3 (new)exptime=0 time=1319726832/306609
s ok 6 - stored key
t ok 7 - we've not allocated and freed the same amount
u # Looks like you failed 1 test of 7.

So in this case the first store (on line e) happens when current_time is 2 at wallclock ...830.247984 seconds; and the expiry is set to 2+1=3

(line h) The clock_handler loop appears to have been very sluggish, and we get back to set_current_time at ...831.563994 ~1.3s later where we set current_time to 3

Then we go and do the second set (line i), current time is still '3' yet the wallclock is just over 2 seconds since the original store. And since the expiry was 3 and current time is 3, we haven't expired and hence the test fails.

---
OK, so the OS seems to have been very sluggish at the event for clock_handler; I think the granularity of the timer is 128Hz, so I'm not quite sure why it took 1.3s rather than 1.008 - but the fact it's taking any longer than 1 (which is legal) means the 2second limit in the test 163 isn't safe.

Dave

Reply via email to