I put in an option to build a wakeup latency histogram for semaphores.  Upon running timetst twice, it prints out the following.  The left column is micro seconds and the right column is number of times the latency fell into that interval.  Note that there are some large outliers and that the distribution is bi-modal.

       1      2675293
       2      1191365
       3       800651
       4       768345
       5       860483
       6      1027069
       7       251115
       8           20
       9       123572
      10       121859
      20      3129553
      30      2216368
      40       396652
      50         2413
      60           91
      70           36
      80           48
      90           37
     100           22
     200          105
     300           29
     400            1
     500            4
     600            1
     700            3
    3000           47
    4000            4
   20000           10
   30000            7
   50000            4
   70000            4
  200000           12
  300000            4
 2000000            1
 3000000           10
 4000000           10
 6000000            2
 9000000            6
  Larger            7

At 02:30 AM 7/29/2004, Golbach, Klaus wrote:
This is the second try, the last mail was too long.

Hello LiS-Comunity,
I tested with LiS-2.17-R on a Intel (*86) mono and two-processormachine.
The Linux version is Suse SLES8.
I called the strtst and (depending on amount of debug-loggings, type of
machine and other conditions, which I do not know,) the test sometimes
runs well, but mostly failed at one of "band_tests".
On the two processor machine sometimes it was not possible to unload
streams and I had to reboot the machine.

Looking at the logging in messages, I had the feeling, that the
appropriate service routines were not called in time.
I changed the last option, asked in the Configure phase, to n and added
some debugging-instructions to prove, if this is correct. (You can find
the extensions at the end as diff)

To debug in this way is not nice, because:
- Adding or removing printk-instructions does change the behaviour to
any direction: worse or better.
- You can never be sure, that some important entries are not lost.
So I can not present "reasonable traces" from all the errors, that
happened and can reproduce the same error only with some luck.

In the appended Traces, which are reasonable good to read, I added also
remarks to bring your attention to the points, which seemed important
for me.
The result:
The message sent down is not processed in time. Generally it takes a
long time until lis_thread_runqueues is waked up.
In worst cases it took up to 0,22 seconds from lis_up to scheduling of
lis_thread_runqueues!
Guess something is wrong with the priority of this LiS-RunQ-process.
Dont know how to prove or make it better.

I grepped for the keyword mregparm in all files under /usr/src/ to
exclude the mentioned problems with parameter passing. Did find it only
in comments.

So please can You give me some proposals how to go on with that problem.
May be, You know more about this problem.
Other proposals are also welcome, because I am new at Linux.

I also tested with other LiS-versions:
LiS-2.16:
No troubles until now. There lis_run_queues is called mostly direct, so
there is no delay.
But we have to use LiS-2.17/18, because we need the feature "per-driver
locking option in LiS"
LiS-2.17-H:
The same problems as in LiS-2.17-R. Additional the two processor machine
panics. Also some minor problems.

I changed lis_setqsched at the LiS-2.17* versions (see switch
CALL_DIRECT in linux-mdep.c) and called lis_run_queues in most cases
directly. Strtst did not fail.
By the way: strtst lasts 161 seconds in old way and 95 seconds with
direct call.

This is my emergency exit (but do not feel good about):
Changing to direct call, will change the behaviour and will be a work
around for a while until going into strong tests.
But it is not the real solution. It will not work, if in
interruptcontext, lis_in_syscall==1, can_call=0.
Additional Dave changed his strategy away from direct call and I do not
know the reasons why and if this flags: lis_in_syscall, can_call will be
set correct in this and future versions of LiS, since they are no more
necessary for his new strategy in lis_setqsched.
What do You think about?

best regards
   Klaus Golbach

*

Reply via email to