What's EXIT_SUCCESS? Call boinc_finish(0). -- David
On 21-Aug-2010 11:13 AM, Ken Brazier wrote: > Since this is my first post here, I guess I should introduce myself. > I'm a volunteer developer for PrimeGrid. I've managed to make a couple > of native BOINC applications for them, but one of them, a CUDA app, is > doing strange things in some cases. > > Here's the main problem (that I'm asking about here) in the stderr output: > > <core_client_version>6.10.17</core_client_version> > <![CDATA[ > <message> > Maximum elapsed time exceeded > </message> > <stderr_txt> > Sieve started: 420825000000000<= p< 420826000000000 > Thread 0 starting > Detected GPU 0: GeForce 9800 GT > Detected compute capability: 1.1 > Detected 14 multiprocessors. > > Thread 0 completed > Waiting for threads to exit > Sieve complete: 420825000000000<= p< 420826000000000 > count=29694269,sum=0x6a4d4aa8c5ece825 > Elapsed time: 1690.46 sec. (0.05 init + 1690.41 sieve) at 591620 p/sec. > Processor time: 10.50 sec. (0.06 init + 10.44 sieve) at 95793042 p/sec. > Average processor utilization: 1.24 (init), 0.01 (sieve) > Sieve started: 420825000000000<= p< 420826000000000 > Thread 0 starting > Detected GPU 0: GeForce 9800 GT > Detected compute capability: 1.1 > Detected 14 multiprocessors. > > Thread 0 completed > Waiting for threads to exit > Sieve complete: 420825000000000<= p< 420826000000000 > count=29694269,sum=0x6a4d4aa8c5ece825 > Elapsed time: 1690.48 sec. (0.05 init + 1690.43 sieve) at 591612 p/sec. > Processor time: 10.50 sec. (0.05 init + 10.45 sieve) at 95701374 p/sec. > Average processor utilization: 1.03 (init), 0.01 (sieve) > Sieve started: 420825000000000<= p< 420826000000000 > Thread 0 starting > Detected GPU 0: GeForce 9800 GT > Detected compute capability: 1.1 > Detected 14 multiprocessors. > > Thread 0 completed > Waiting for threads to exit > Sieve complete: 420825000000000<= p< 420826000000000 > count=29694269,sum=0x6a4d4aa8c5ece825 > Elapsed time: 1690.48 sec. (0.05 init + 1690.43 sieve) at 591612 p/sec. > Processor time: 10.51 sec. (0.05 init + 10.46 sieve) at 95609881 p/sec. > Average processor utilization: 1.03 (init), 0.01 (sieve) > Sieve started: 420825000000000<= p< 420826000000000 > Thread 0 starting > Detected GPU 0: GeForce 9800 GT > Detected compute capability: 1.1 > Detected 14 multiprocessors. > > Thread 0 completed > Waiting for threads to exit > Sieve complete: 420825000000000<= p< 420826000000000 > count=29694269,sum=0x6a4d4aa8c5ece825 > Elapsed time: 1690.47 sec. (0.05 init + 1690.42 sieve) at 591614 p/sec. > Processor time: 10.49 sec. (0.05 init + 10.44 sieve) at 95793042 p/sec. > Average processor utilization: 1.03 (init), 0.01 (sieve) > > </stderr_txt> > ]]> > > The app completes, apparently successfully, but then for some reason > BOINC restarts it. Again and again! Also, although the app ran for ~10 > CPU-seconds at a time four times (that's normal for CUDA apps), and had > 6,724.62 seconds of runtime in total, BOINC recorded 0 seconds of CPU time. > > If you're interested, the code is v0.1.5, from here: > http://github.com/Ken-g6/PSieve-CUDA > > After that last fprintf in main.c, I do only two things. First, I > re-raise any SIGINT, SIGTERM, or SIGHUP that the process may have > received. And second I call boinc_finish, with the argument EXIT_SUCCESS > as evidenced by another line in that result. > > I should add that there are also cases where the app restarts on > failure. Here's one: > > <core_client_version>6.10.58</core_client_version> > <![CDATA[ > <message> > Unzul�ssige Funktion. (0x1) - exit code 1 (0x1) > </message> > <stderr_txt> > Sieve started: 456189000000000<= p< 456190000000000 > Thread 0 starting > Detected GPU 0: Device Emulation (CPU) > Detected compute capability: 9999.9999 > Detected 16 multiprocessors. > Insufficient available memory on GPU 0. > 16:42:43 (6472): called boinc_finish > Sieve started: 456189000000000<= p< 456190000000000 > Thread 0 starting > Detected GPU 0: Device Emulation (CPU) > Detected compute capability: 9999.9999 > Detected 16 multiprocessors. > Insufficient available memory on GPU 0. > 16:45:26 (8132): called boinc_finish > > </stderr_txt> > ]]> > > The reason for failure isn't relevant here (trying to run on the CUDA > emulator.) The fact that it tried it again, after once calling > boinc_finish(1), is. Plus, there are no signals trapped on that path. > > So why did BOINC restart my app? And why didn't it count any of the > runtime? Should I not re-raise those signals? I built the app with the > development files that came with Ubuntu 9.04: version 6.2.18-3ubuntu1. > Is that too old? Any other ideas? > > Thanks! > > Ken > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
