Hi, Here are the problem in hugeshmctl01 and a patch to fix it. Please review. Thanks.
Problem description(Shen, Lin Feng: [EMAIL PROTECTED]):
I am testing hugetlb with ltp-full-20080430. Those cases under
${LTPROOT}/testcases/kernel/mem/hugetlb/ are executed one by one again and
again. The test runs fine in the first a few hundreds of loops, but after
hugeshmctl01 fails for the first time, some other cases fails a lot too.
---------------- Here is the staf status -----------------
clashlp1:/proc/sys/kernel # gss
Hostname : clashlp1
Kernel : 2.6.16.60-0.17-ppc64
Kernel Build Date : Tue Apr 22 07:28:35 UTC 2008
Distribution : SUSE
--------
Job ID : 1
Focus Group : BASE
XML File Name : /usr/local/staf/xml/clashlp1.base.xml
Function : Test
Arguments : null
Start Date : 20080502
Start Time : 14:32:06
Clear Logs : Disabled
Log TC Elapsed Time: Disabled
Log TC Num Starts : Disabled
Log TC Start/Stop : Disabled
BASE Start Time: Fri May 2 14:32:06 CDT 2008
Snapshot Time: Sun May 4 03:48:38 CDT 2008
--------
hugemmap01 (0)-local;944;7858;8802
hugemmap02 (0)-local;8802;0;8802
hugemmap03 (0)-local;8801;0;8802
hugemmap04 (0)-local;908;7893;8801
hugeshmat01 (0)-local;945;7857;8802
hugeshmat02 (0)-local;909;7893;8802
hugeshmat03 (0)-local;945;7857;8802
hugeshmctl01 (0)-local;943;7859;8802
hugeshmctl02 (0)-local;908;7894;8802
hugeshmctl03 (0)-local;944;7858;8802
hugeshmdt01 (0)-local;944;7858;8802
hugeshmget01 (0)-local;945;7857;8802
hugeshmget02 (0)-local;8802;0;8802
hugeshmget03 (0)-local;8802;0;8802
hugeshmget05 (0)-local;945;7857;8802
--pass--fail--unused
---------------- Here is the ltp log ----------------
The first failure is hugeshmctl01.
hugeshmctl01 1 FAIL : # of attaches is incorrect - 3
hugeshmctl01 2 PASS : pid, size, # of attaches and mode are correct
- pass #2
hugeshmctl01 3 PASS : new mode and change time are correct
hugeshmctl01 4 PASS : shared memory appears to be removed
------- Here is the meminfo -------
before hugeshmctl01 fails:
clashlp1:~ # cat /proc/meminfo | tail -4
HugePages_Total: 32
HugePages_Free: 32
HugePages_Rsvd: 0
Hugepagesize: 16384 kB
clashlp1:~ #
after hugeshmctl01 fails:
clashlp1:~ # cat /proc/meminfo | tail -4
HugePages_Total: 32
HugePages_Free: 30
HugePages_Rsvd: 30
Hugepagesize: 16384 kB
clashlp1:~ #
-------------------------------------
It seems that hugeshmctl01 doesn't free some hugetlb pages when it fails.
ps
shows that there is still an instance of hugeshmctl01 left even if
hugeshmctl01
is not running which may attach some hugetlb pages.
-------------------------------------
clashlp1:~ # ps ax | grep huge
14166 pts/23 S+ 0:00 grep huge
29360 ? S 0:00 hugeshmctl01
clashlp1:~ #
-------------------------------------
The problem is due to the arbitrary usleep time in hugeshmctl01 which
results in
incorrect execution order. The intention of the sleep time is to ensure
the
children call shmat() and pause() before the parent checks shm status and
calls
stat_cleanup(). But there is no absolute assurance that this sleep always
works.
------------
281 /* sleep briefly to ensure correct execution order */
282 usleep(250000);
------------
In the failure above, the last child process forked by the parent may not
run
and call shmat() immediately after it's created. When the parent checks
shm
status, it finds only 3 child attaching the shm instead of 4, so it
reports the
failure. And then it calls stat_cleanup() to send SIGUSR1 to all children,
but
since the last child hasn't called pause() yet, SIGUSR1 is handled before
pause(). When the last child calls pause(), since there is no further
signal to
wake it up, it sleeps forever.
Patch:
The patch is not to change the arbitrary usleep time since any time is
arbitrary though a large time is more acceptable. The patch is to use
sigprocmask() to block SIGUSR1 before children sleep for SIGUSR1 from
parent,
and then call sigsuspend() to unblock SIGUSR1 and sleep for SIGUSR1. By
doing
so, we may avoid the infinite sleep and keeping attached shm forever so
that
affect other hugetlb test.
In parent process, aonther sigprocmask() is called before usleep(). This
has
the same effect of sleep more time.
fix_hugeshmctl01_children_pause_forever.patch
Description: Binary data
------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________ Ltp-list mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ltp-list
