Attached is a patch against trunk that fixes this problem by changing the test suite. For some of the tests, it is sufficient to change the poll() timeout from 0 (one-time poll) to -1 (blocking poll). In a few other places, the semantics of the test needed to be changed -- e.g. if we do a blocking poll() after sending two messages, we need to account for seeing either 1 or 2 messages.
With these changes applied, OSX 10.6 passes testpoll reliably (for ~1000 local runs), using both the POLLSET_POLL and POLLSET_KQUEUE methods. Neil On Sat, Oct 24, 2009 at 7:30 PM, Neil Conway <[email protected]> wrote: > On a related note, ISTM that many of the tests for the poll / pollset > features are wrong in principle. They apparently assume that if you > send a UDP datagram to localhost and then immediately poll() for it > (with a timeout of zero), the poll() will pickup the UDP datagram you > just sent. That is not a safe assumption, however (e.g. I see > intermittent test failures due to this issue when using > APR_POLLSET_POLL on OSX 10.6). > > Similarly, send_middle_pollset() assumes that if you send two > datagrams and then poll(), the poll will return exactly two datagrams, > whereas it might actually return 0, 1, or 2. And that's not even > accounting for the possibility of UDP packet drops, which is possible > even on localhost if the machine is under load. > > Neil > > On Sun, Oct 18, 2009 at 4:37 AM, Ruediger Pluem <[email protected]> wrote: >> >> >> On 10/17/2009 11:58 PM, Ryan Phillips wrote: >>> On Sat, Oct 17, 2009 at 2:40 AM, Ruediger Pluem <[email protected]> wrote: >>>> >>>> On 10/17/2009 05:50 AM, Ryan Phillips wrote: >>>>> On Wed, Oct 14, 2009 at 12:02 PM, Neil Conway <[email protected]> >>>>> wrote: >>>>>> "./tests/testall testpoll" segfaults for me consistently on OSX 10.6.1 >>>>>> with the latest code from the 1.4-stable branch (64-bit APR library). >>>>>> gdb info: >>>>>> >>>>>> #0 0x000000010000e9b7 in send0_pollset (tc=0x7fff5fbfef80, data=0x0) >>>>>> at testpoll.c:389 >>>>>> 389 ABTS_PTR_EQUAL(tc, s[0], descs[0].desc.s); >>>>>> (gdb) bt >>>>>> #0 0x000000010000e9b7 in send0_pollset (tc=0x7fff5fbfef80, data=0x0) >>>>>> at testpoll.c:389 >>>>>> #1 0x0000000100001456 in abts_run_test (ts=0x100200190, f=0x10000e925 >>>>>> <send0_pollset>, value=0x0) at abts.c:168 >>>>>> #2 0x000000010000f713 in testpoll (suite=0x100200190) at testpoll.c:685 >>>>>> #3 0x0000000100001e35 in main (argc=2, argv=0x7fff5fbff020) at >>>>>> abts.c:424 >>>>>> (gdb) p descs >>>>>> $1 = (const apr_pollfd_t *) 0x0 >>>>>> (gdb) p s[0] >>>>>> $2 = (apr_socket_t *) 0x100804240 >>>> What is the value of num? >>>> >>>>>> (gdb) l >>>>>> 384 rv = apr_pollset_poll(pollset, 0, &num, &descs); >>>>>> 385 ABTS_INT_EQUAL(tc, APR_SUCCESS, rv); >>>>>> 386 ABTS_INT_EQUAL(tc, 1, num); >>>>>> 387 ABTS_PTR_NOTNULL(tc, descs); >>>>>> 388 >>>>>> 389 ABTS_PTR_EQUAL(tc, s[0], descs[0].desc.s); >>>>>> 390 ABTS_PTR_EQUAL(tc, s[0], descs[0].client_data); >>>>>> 391 } >>>>>> 392 >>>>>> 393 static void recv0_pollset(abts_case *tc, void *data) >>>>>> >>>> Regards >>>> >>>> Rüdiger >>>> >>> >>> Num on the freebsd machine is 0. >>> >> >> Thanks for that. >> >> I guess we have two problems here: >> >> 1. The crash: We simply should not execute the lines 389 and 390 if descs is >> NULL. >> Similar situations occur in various other parts of the test suite. >> We use ABTS_PTR_NOTNULL and continue afterwards and continue to use the >> pointer >> that failed ABTS_PTR_NOTNULL. So does this need to be fixed everywhere >> where this >> occurs? I guess a crash of the test program just because ABTS_PTR_NOTNULL >> failed >> is not acceptable. >> >> 2. If descs is NULL it means that the test failed as we have the >> ABTS_PTR_NOTNULL >> test in line 387. The question is: Why does this test fail? >> >> Regards >> >> Rüdiger >> >
test_poll_timeout_fix-1.patch
Description: Binary data
