What's EXIT_SUCCESS?  Call boinc_finish(0).
-- David

On 21-Aug-2010 11:13 AM, Ken Brazier wrote:
> Since this is my first post here, I guess I should introduce myself.
> I'm a volunteer developer for PrimeGrid.  I've managed to make a couple
> of native BOINC applications for them, but one of them, a CUDA app, is
> doing strange things in some cases.
>
> Here's the main problem (that I'm asking about here) in the stderr output:
>
> <core_client_version>6.10.17</core_client_version>
> <![CDATA[
> <message>
> Maximum elapsed time exceeded
> </message>
> <stderr_txt>
> Sieve started: 420825000000000<= p<  420826000000000
> Thread 0 starting
> Detected GPU 0: GeForce 9800 GT
> Detected compute capability: 1.1
> Detected 14 multiprocessors.
>
> Thread 0 completed
> Waiting for threads to exit
> Sieve complete: 420825000000000<= p<  420826000000000
> count=29694269,sum=0x6a4d4aa8c5ece825
> Elapsed time: 1690.46 sec. (0.05 init + 1690.41 sieve) at 591620 p/sec.
> Processor time: 10.50 sec. (0.06 init + 10.44 sieve) at 95793042 p/sec.
> Average processor utilization: 1.24 (init), 0.01 (sieve)
> Sieve started: 420825000000000<= p<  420826000000000
> Thread 0 starting
> Detected GPU 0: GeForce 9800 GT
> Detected compute capability: 1.1
> Detected 14 multiprocessors.
>
> Thread 0 completed
> Waiting for threads to exit
> Sieve complete: 420825000000000<= p<  420826000000000
> count=29694269,sum=0x6a4d4aa8c5ece825
> Elapsed time: 1690.48 sec. (0.05 init + 1690.43 sieve) at 591612 p/sec.
> Processor time: 10.50 sec. (0.05 init + 10.45 sieve) at 95701374 p/sec.
> Average processor utilization: 1.03 (init), 0.01 (sieve)
> Sieve started: 420825000000000<= p<  420826000000000
> Thread 0 starting
> Detected GPU 0: GeForce 9800 GT
> Detected compute capability: 1.1
> Detected 14 multiprocessors.
>
> Thread 0 completed
> Waiting for threads to exit
> Sieve complete: 420825000000000<= p<  420826000000000
> count=29694269,sum=0x6a4d4aa8c5ece825
> Elapsed time: 1690.48 sec. (0.05 init + 1690.43 sieve) at 591612 p/sec.
> Processor time: 10.51 sec. (0.05 init + 10.46 sieve) at 95609881 p/sec.
> Average processor utilization: 1.03 (init), 0.01 (sieve)
> Sieve started: 420825000000000<= p<  420826000000000
> Thread 0 starting
> Detected GPU 0: GeForce 9800 GT
> Detected compute capability: 1.1
> Detected 14 multiprocessors.
>
> Thread 0 completed
> Waiting for threads to exit
> Sieve complete: 420825000000000<= p<  420826000000000
> count=29694269,sum=0x6a4d4aa8c5ece825
> Elapsed time: 1690.47 sec. (0.05 init + 1690.42 sieve) at 591614 p/sec.
> Processor time: 10.49 sec. (0.05 init + 10.44 sieve) at 95793042 p/sec.
> Average processor utilization: 1.03 (init), 0.01 (sieve)
>
> </stderr_txt>
> ]]>
>
> The app completes, apparently successfully, but then for some reason
> BOINC restarts it. Again and again! Also, although the app ran for ~10
> CPU-seconds at a time four times (that's normal for CUDA apps), and had
> 6,724.62 seconds of runtime in total, BOINC recorded 0 seconds of CPU time.
>
> If you're interested, the code is v0.1.5, from here:
> http://github.com/Ken-g6/PSieve-CUDA
>
> After that last fprintf in main.c, I do only two things. First, I
> re-raise any SIGINT, SIGTERM, or SIGHUP that the process may have
> received. And second I call boinc_finish, with the argument EXIT_SUCCESS
> as evidenced by another line in that result.
>
> I should add that there are also cases where the app restarts on
> failure. Here's one:
>
> <core_client_version>6.10.58</core_client_version>
> <![CDATA[
> <message>
> Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
> </message>
> <stderr_txt>
> Sieve started: 456189000000000<= p<  456190000000000
> Thread 0 starting
> Detected GPU 0: Device Emulation (CPU)
> Detected compute capability: 9999.9999
> Detected 16 multiprocessors.
> Insufficient available memory on GPU 0.
> 16:42:43 (6472): called boinc_finish
> Sieve started: 456189000000000<= p<  456190000000000
> Thread 0 starting
> Detected GPU 0: Device Emulation (CPU)
> Detected compute capability: 9999.9999
> Detected 16 multiprocessors.
> Insufficient available memory on GPU 0.
> 16:45:26 (8132): called boinc_finish
>
> </stderr_txt>
> ]]>
>
> The reason for failure isn't relevant here (trying to run on the CUDA
> emulator.) The fact that it tried it again, after once calling
> boinc_finish(1), is. Plus, there are no signals trapped on that path.
>
> So why did BOINC restart my app? And why didn't it count any of the
> runtime? Should I not re-raise those signals? I built the app with the
> development files that came with Ubuntu 9.04: version 6.2.18-3ubuntu1.
> Is that too old? Any other ideas?
>
> Thanks!
>
> Ken
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to