Signed-off-by: Lin Feng Shen <[EMAIL PROTECTED]>
Thanks & Best regards,
----------
Lin Feng Shen 沈林峰
Linux for System p Test, China Systems & Technology Lab
China Development Labs, Beijing Tel: 86-10-82452244 Ext. 53535 Fax: 2312
Email: [EMAIL PROTECTED]
Address: 5F, De Shi Building, No.9, Shangdi East Road, Haidian District,
Beijing, P.R.China 100085
Subrata Modak <[EMAIL PROTECTED]>
05-05-08 下午 02:41
Please respond to
[EMAIL PROTECTED]
To
ltp-list <ltp-list@lists.sourceforge.net>
cc
Lin Feng Shen/China/[EMAIL PROTECTED], supriyak <[EMAIL PROTECTED]>
Subject
[PATCH] Arbitrary usleep time in LTP hugeshmctl01 results in incorrect
execution order
Hi all,
Please see a Problem description with hugeshmctl01 test case in LTP,
and, the corresponding solution for that:
=================================================================
Problem Description:Lin Feng Shen
=================================================================
I am testing hugetlb with ltp-full-20080430. Those cases under
${LTPROOT}/testcases/kernel/mem/hugetlb/ are executed one by one again and
again. The test runs fine in the first a few hundreds of loops, but after
hugeshmctl01 fails for the first time, some other cases fails a lot too.
---------------- Here is the staf status -----------------
$> /proc/sys/kernel # gss
Hostname :
Kernel : 2.6.16.60-0.17-ppc64
Kernel Build Date : Tue Apr 22 07:28:35 UTC 2008
Distribution : SUSE
--------
BASE Start Time: Fri May 2 14:32:06 CDT 2008
Snapshot Time: Sun May 4 03:48:38 CDT 2008
--------
hugemmap01 (0)-local;944;7858;8802
hugemmap02 (0)-local;8802;0;8802
hugemmap03 (0)-local;8801;0;8802
hugemmap04 (0)-local;908;7893;8801
hugeshmat01 (0)-local;945;7857;8802
hugeshmat02 (0)-local;909;7893;8802
hugeshmat03 (0)-local;945;7857;8802
hugeshmctl01 (0)-local;943;7859;8802
hugeshmctl02 (0)-local;908;7894;8802
hugeshmctl03 (0)-local;944;7858;8802
hugeshmdt01 (0)-local;944;7858;8802
hugeshmget01 (0)-local;945;7857;8802
hugeshmget02 (0)-local;8802;0;8802
hugeshmget03 (0)-local;8802;0;8802
hugeshmget05 (0)-local;945;7857;8802
--pass--fail--unused
---------------- Here is the ltp log ----------------
The first failure is hugeshmctl01.
hugeshmctl01 1 FAIL : # of attaches is incorrect - 3
hugeshmctl01 2 PASS : pid, size, # of attaches and mode are correct
- pass #2
hugeshmctl01 3 PASS : new mode and change time are correct
hugeshmctl01 4 PASS : shared memory appears to be removed
------- Here is the meminfo -------
before hugeshmctl01 fails:
clashlp1:~ # cat /proc/meminfo | tail -4
HugePages_Total: 32
HugePages_Free: 32
HugePages_Rsvd: 0
Hugepagesize: 16384 kB
clashlp1:~ #
after hugeshmctl01 fails:
clashlp1:~ # cat /proc/meminfo | tail -4
HugePages_Total: 32
HugePages_Free: 30
HugePages_Rsvd: 30
Hugepagesize: 16384 kB
clashlp1:~ #
-------------------------------------
It seems that hugeshmctl01 doesn't free some hugetlb pages when it fails.
ps
shows that there is still an instance of hugeshmctl01 left even if
hugeshmctl01
is not running which may attach some hugetlb pages.
-------------------------------------
clashlp1:~ # ps ax | grep huge
14166 pts/23 S+ 0:00 grep huge
29360 ? S 0:00 hugeshmctl01
clashlp1:~ #
-------------------------------------
The problem is due to the arbitrary usleep time in hugeshmctl01 which
results in
incorrect execution order. The intention of the sleep time is to ensure
the
children call shmat() and pause() before the parent checks shm status and
calls
stat_cleanup(). But there is no absolute assurance that this sleep always
works.
------------
281 /* sleep briefly to ensure correct execution order */
282 usleep(250000);
------------
In the failure above, the last child process forked by the parent may not
run
and call shmat() immediately after it's created. When the parent checks
shm
status, it finds only 3 child attaching the shm instead of 4, so it
reports the
failure. And then it calls stat_cleanup() to send SIGUSR1 to all children,
but
since the last child hasn't called pause() yet, SIGUSR1 is handled before
pause(). When the last child calls pause(), since there is no further
signal to
wake it up, it sleeps forever.
=================================================================
Patch: Lin Feng Shen
=================================================================
patch to ensure children can receive and handle SIGUSR1 from parent in
pause()
The patch is not to change the arbitrary usleep time since any time is
arbitrary though a large time is more acceptable. The patch is to use
sigprocmask() to block SIGUSR1 before children sleep for SIGUSR1 from
parent,
and then call sigsuspend() to unblock SIGUSR1 and sleep for SIGUSR1. By
doing
so, we may avoid the infinite sleep and keeping attached shm forever so
that
affect other hugetlb test.
In parent process, aonther sigprocmask() is called before usleep(). This
has
the same effect of sleep more time.
With this patch, I don't see the problem again.
--------------------------
Kernel : 2.6.16.60-0.17-ppc64
Kernel Build Date : Tue Apr 22 07:28:35 UTC 2008
Distribution : SUSE
--------
BASE Start Time: Sun May 4 20:26:11 CDT 2008
Snapshot Time: Mon May 5 00:05:21 CDT 2008
--------
hugemmap01 (0)-local;803;0;80
hugemmap02 (0)-local;803;0;80
hugemmap03 (0)-local;803;0;80
hugemmap04 (0)-local;803;0;80
hugeshmat01 (0)-local;803;0;80
hugeshmat02 (0)-local;803;0;80
hugeshmat03 (0)-local;803;0;80
hugeshmctl01 (0)-local;803;0;80
hugeshmctl02 (0)-local;803;0;80
hugeshmctl03 (0)-local;803;0;80
hugeshmdt01 (0)-local;803;0;80
hugeshmget01 (0)-local;803;0;80
hugeshmget02 (0)-local;803;0;80
hugeshmget03 (0)-local;803;0;80
hugeshmget05 (0)-local;803;0;80
=================================================================
End Description & Solution
=================================================================
Please review whether any one of you face the same problem and whether
the patch solves your problem too.
Regards--
Subrata
[attachment "05_05_2008-([EMAIL PROTECTED])-hugeshmctl01.patch" deleted
by Lin Feng Shen/China/IBM]
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list