Hi Stefan,

On 28.04.2011 00:34, Stefan Fritsch wrote:
On Tuesday 26 April 2011, Rainer Jung wrote:
+1 although there are still two problems on Solaris 10 for
test_reslist, but not a regression.

I built and made check on the following platforms:

- Solaris 8 + 10, Sparc
- SuSE Linux Enterprise 10 32 and 64 Bit
- RedHat Enterprise Linux 5, 64 Bit

Using all combinations of:

apr 1.3.12 / 1.4.2
expat builtin / 2.0.1
dso disable / enable
Berkeley DB 4.8.30 5.0.26 5.1.19
sqlite 3.7.2
mysql 6.0.2 (only Solaris)
oracle 10.2.0.4.0 (only Solaris)

All builds suceeded, all make check ran fine, except for two cases
on Solaris 10 (this time not Niagara, but instead old sun4u - V240
with 2 CPUs).

I reran the tests and couldn't reproduce the problem, so it is not
deterministic. Out of 48 build combinations on Solaris 10, only
three had a problem. This is similar to 1.3.10, but it is not
always the same combinations. Like for 1.3.10 problem happens on
Solaris 10 but not on Solaris 8.

Details on Solaris 10 test failures

- only in testreslist
- two types of failures:
    - twice crashes (segmentation fault)
    - once non-terminating loop
- Crashes seem not really related to used apr version (one for 1.3
and one for 1.4)

I also get undeterministic test failures on the Debian build machines,
mostly hangs in testreslist. It happens on mipsel and sparc much more
often than on the other architectures, and some architectures had no
failure at all. Which compiler are you using? If you are using gcc, it
could be a gcc bug.

On Sparc I use gcc 4.1.2. All builds are 32 Bit.

Concerning the hangs (unterminated loops in my case), I did some more investigation for 1.3.10 and confirmed using GDB, that there actually was a cycle in the cleanups:

(gdb) print c
$1 = (cleanup_t *) 0x38558
(gdb) print *c
$2 = {next = 0x38558, data = 0x38558, plain_cleanup_fn = 0x38710, child_cleanup_fn = 0x38798}

so c == c->next and thus apr_pool_cleanup_kill looped.

I didn't check, whether that was still true for 1.3.11. I don't know why c == c->next.

Concerning gcc: I use the same gcc for building on Solaris 8 and on Solaris 10, even the same binary gcc files. I never observed a problem on the single CPU Sparc 8 system, but did observer problems on Solaris 10 for 1.3.10 and for 1.3.11. Apart from the OS version the other major difference is concurrency in hardware (used Niagara CPU with 6 or 8 cores and 4 times the number of strands when testing 1.3.10, and a more traditional 2 CPU Sparc V240 when testing 1.3.11).

I hope I have some time to check older versions, like 1.3.9 etc. and maybe also older apr (pool) versions to see, whether I can narrow down the reason. Unfortunately until now, I could only reproduce the two problems (unterminated loop, crash) when doing the testing as part of the mass building, which takes time (a couple of hours). When running testall after building even in loops, I could not reproduce the problems ...

Regards,

Rainer

Reply via email to