I think this might be a thread (as opposed to application) priority issue. Compare these two Process Explorer screenshots: http://img834.imageshack.us/img834/9496/einsteincudapriority.png http://img703.imageshack.us/img703/3087/seticudapriority.png
(both taken from the same computer, during the same BOINC session) The worker thread for the Einstein application is running at priority one, whereas the equivalent thread for SETI is running at priority six. (The main application thread is running at priority six in both cases) The SETI application in the screenshot is the original setiathome_6.08_windows_intelx86__cuda.exe written by NVidia for SETI's cuda launch in January 2009, so NVidia should be able to explain how to work round the discrepancy. Note that thread priority is (AFAIK) a Windows-only concept, so this probably won't help you Linux/Mac issues. ----- Original Message ----- From: Oliver Bock To: David Anderson ; boinc_dev ; Boinc Projects Sent: Monday, December 20, 2010 11:12 AM Subject: [boinc_dev] CUDA task scheduling Hi everyone, We just deployed a new CUDA application (called BRP3) as part of the einst...@home project. This app roughly up to 75% of a GPU and 3-30% of a CPU, depending on the GPU model/performance. Thus our scheduler currently issues these tasks with the following settings: hu.avg_ncpus = 0.2 hu.ncudas = 1 Please note that BOINC (e.g. sched/sched_customize) revision 22832 is used in this case. The problem is that with the settings above BOINC starts CUDA tasks in addition to CPU tasks that already occupy all existing CPU cores. This means on a system having four CPU cores and two CUDA devices, four CPU tasks and two CUDA tasks are launched. Although this behavior is intended, it doesn't really work out for us because the performance of the CUDA tasks is degraded significantly - GPU usage goes down to less than 10%, increasing the runtime by the same factor. Although the CUDA tasks run with slightly higher priority (below normal on Windows) than the CPU tasks (low on Windows) they are limited by the already fully-occupied CPU cores which are still required for up to 30% of the computation. Since we couldn't yet release a Linux or Mac OS version we don't know whether this is a Windows time-slicing issue or not. Are there any other projects running CUDA tasks in a comparable way? The only workaround in sight would be to acquire a full CPU core once again but that's certainly not ideal. Any ideas are welcome! Cheers, Oliver _______________________________________________ boinc_dev mailing list boinc_dev@ssl.berkeley.edu http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address. _______________________________________________ boinc_dev mailing list boinc_dev@ssl.berkeley.edu http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.