Since this is my first post here, I guess I should introduce myself. I'm a volunteer developer for PrimeGrid. I've managed to make a couple of native BOINC applications for them, but one of them, a CUDA app, is doing strange things in some cases.
Here's the main problem (that I'm asking about here) in the stderr output: <core_client_version>6.10.17</core_client_version> <![CDATA[ <message> Maximum elapsed time exceeded </message> <stderr_txt> Sieve started: 420825000000000 <= p < 420826000000000 Thread 0 starting Detected GPU 0: GeForce 9800 GT Detected compute capability: 1.1 Detected 14 multiprocessors. Thread 0 completed Waiting for threads to exit Sieve complete: 420825000000000 <= p < 420826000000000 count=29694269,sum=0x6a4d4aa8c5ece825 Elapsed time: 1690.46 sec. (0.05 init + 1690.41 sieve) at 591620 p/sec. Processor time: 10.50 sec. (0.06 init + 10.44 sieve) at 95793042 p/sec. Average processor utilization: 1.24 (init), 0.01 (sieve) Sieve started: 420825000000000 <= p < 420826000000000 Thread 0 starting Detected GPU 0: GeForce 9800 GT Detected compute capability: 1.1 Detected 14 multiprocessors. Thread 0 completed Waiting for threads to exit Sieve complete: 420825000000000 <= p < 420826000000000 count=29694269,sum=0x6a4d4aa8c5ece825 Elapsed time: 1690.48 sec. (0.05 init + 1690.43 sieve) at 591612 p/sec. Processor time: 10.50 sec. (0.05 init + 10.45 sieve) at 95701374 p/sec. Average processor utilization: 1.03 (init), 0.01 (sieve) Sieve started: 420825000000000 <= p < 420826000000000 Thread 0 starting Detected GPU 0: GeForce 9800 GT Detected compute capability: 1.1 Detected 14 multiprocessors. Thread 0 completed Waiting for threads to exit Sieve complete: 420825000000000 <= p < 420826000000000 count=29694269,sum=0x6a4d4aa8c5ece825 Elapsed time: 1690.48 sec. (0.05 init + 1690.43 sieve) at 591612 p/sec. Processor time: 10.51 sec. (0.05 init + 10.46 sieve) at 95609881 p/sec. Average processor utilization: 1.03 (init), 0.01 (sieve) Sieve started: 420825000000000 <= p < 420826000000000 Thread 0 starting Detected GPU 0: GeForce 9800 GT Detected compute capability: 1.1 Detected 14 multiprocessors. Thread 0 completed Waiting for threads to exit Sieve complete: 420825000000000 <= p < 420826000000000 count=29694269,sum=0x6a4d4aa8c5ece825 Elapsed time: 1690.47 sec. (0.05 init + 1690.42 sieve) at 591614 p/sec. Processor time: 10.49 sec. (0.05 init + 10.44 sieve) at 95793042 p/sec. Average processor utilization: 1.03 (init), 0.01 (sieve) </stderr_txt> ]]> The app completes, apparently successfully, but then for some reason BOINC restarts it. Again and again! Also, although the app ran for ~10 CPU-seconds at a time four times (that's normal for CUDA apps), and had 6,724.62 seconds of runtime in total, BOINC recorded 0 seconds of CPU time. If you're interested, the code is v0.1.5, from here: http://github.com/Ken-g6/PSieve-CUDA After that last fprintf in main.c, I do only two things. First, I re-raise any SIGINT, SIGTERM, or SIGHUP that the process may have received. And second I call boinc_finish, with the argument EXIT_SUCCESS as evidenced by another line in that result. I should add that there are also cases where the app restarts on failure. Here's one: <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> Unzul�ssige Funktion. (0x1) - exit code 1 (0x1) </message> <stderr_txt> Sieve started: 456189000000000 <= p < 456190000000000 Thread 0 starting Detected GPU 0: Device Emulation (CPU) Detected compute capability: 9999.9999 Detected 16 multiprocessors. Insufficient available memory on GPU 0. 16:42:43 (6472): called boinc_finish Sieve started: 456189000000000 <= p < 456190000000000 Thread 0 starting Detected GPU 0: Device Emulation (CPU) Detected compute capability: 9999.9999 Detected 16 multiprocessors. Insufficient available memory on GPU 0. 16:45:26 (8132): called boinc_finish </stderr_txt> ]]> The reason for failure isn't relevant here (trying to run on the CUDA emulator.) The fact that it tried it again, after once calling boinc_finish(1), is. Plus, there are no signals trapped on that path. So why did BOINC restart my app? And why didn't it count any of the runtime? Should I not re-raise those signals? I built the app with the development files that came with Ubuntu 9.04: version 6.2.18-3ubuntu1. Is that too old? Any other ideas? Thanks! Ken _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
