In article <[EMAIL PROTECTED]>,
Mike Smith  <[EMAIL PROTECTED]> wrote:
> 
> It's not necessarily caused by interrupt latency.  Here's the assumption 
> that's being made.
[...]

Thanks for the superb explanation!  I appreciate it.

> There is a ring of timecounter structures, of some size.  In testing,
> I've used sizes of a thousand or more, but still seen this problem.
> 
> There is a pointer to the "current" timecounter structure.

That's the global variable named "timecounter", right?  I did notice
one potential problem: that variable is not declared volatile.  So
in this part ...

> When one wishes to read the current time, one proceeds as follows:
> 
>  - Get the "current" pointer and save it locally.
>  - Read the timecounter structure via the local "current" pointer.

... the compiler is perfectly free to reread the global multiple
times in the function rather than using the saved local copy.  If the
"current" pointer has moved in that time, we'll an inconsistent view
of the timecounter.  In looking at the generated code I haven't found
any actual instances of that.  But I'll try making it volatile just to
make sure.  Even if it doesn't cause any problems currently, I think
we should change it to volatile since it could start to cause problems
some day.

I also noticed this in tco_forward():

        tco = timecounter;
        tc = sync_other_counter();
        [...]
        if (tco->tc_poll_pps)

But sync_other_counter() loads its own copy of "timecounter",
and there's no guarantee it hasn't changed from the value that
tco_forward() saved in its local variable.  I'm not sure yet if
that's a potential problem.  It could corrected by passing "tco" as
an argument to sync_other_counter.  I'll try that too.

> There are a couple of possible problems with this mechanism.
> 
> One is that the ring "catches up" with your saved copy of the
> "current" pointer, ie. inbetween fetching the pointer and reading the
> timecounter contents, the "next" pointer passes over you again in such
> a fashion that you get garbage out of the structure.

As you mentioned, with a large enough ring this should be impossible.
If I read the code correctly, the "current" pointer is only moved
once per second.  So in the current ring of 4 counters (number 0 is
special), it would take 4 seconds to wrap around the ring.  That's a
pretty long time.

> Another is that there is a race between multiple updaters of the
> timecounter; if two parties are both updating the "next" timecounter
> along with another party trying to get the "current" time, this could
> cause corruption.
> 
> All that interrupt latency will do is make the updates late; I can't
> actually see how it could cause corruption.  Corruption has to be
> caused by mishandling of the timecounter ring in some fashion.

I agree.

> Note that you can probably eliminate the ring loop theory by
> allocating a very large number of entries in the ring by setting
> NTIMECOUNTER (kern/kern_tc.c) higher.  The structures are small; try
> 100,000 or so.

OK, but even the thousand you tried should give a cushion of more
than 16 minutes.

> If you can reproduce under these circumstances, try adding some checks
> to make sure the "current" timecounter pointer is behaving
> monotonically; just save the last timecounter pointer in microtime()
> et. al.
> 
> Another test worth performing is to look at the tco_delta function for
> the timecounter and make sure that it returns a sane value, and one
> that doesn't behave out of synch with the interrupt handler that updates
> the timecounter proper.  If you save the delta value in the timecounter 
> and zero it when it's updated, you can catch this.
> 
> You can rule this out by using getmicroptime() rather than
> microuptime(); it may return the same value twice, which isn't
> desirable, but that would be better than nothing.
> 
> Hope this helps a bit.

Yep, thanks again.

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.                        Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Reply via email to