On Wed, Feb 10, 2010 at 6:46 AM, David Anderson <[email protected]> wrote: > <ncpus> is for debugging, i.e. so that I can test > problems with AQUA's 4-CPU app on my 2-CPU laptop. > > It's not intended as a user preference.
Yes David. You know that. I know that. Most of those reading this list know that. But at the time of BOINC 6.2, without the separate scheduler for CPU and GPU built in, it was introduced as a way to get work on projects (mostly Seti) that gave out a CPU app and a GPU app doing the same work. People wanted to have work for both the CPU and the GPU at the same time, so some clever person found that misusing <ncpus> would do just that, by setting NCPUS to amount of CPUs plus amount of GPUs, e.g. <ncpus>3</ncpus on a duo core + 1 GPU. That changed when BOINC 6.4 came along which did have a separate scheduler for CPU and GPU. Then people who ran that duo core CPU + GPU found that their flag for <ncpus>3</ncpus> would run 3 tasks on their 2 CPUs and one on their GPU. Around that same time people were also asking for ways to run only work on the GPU, not the CPU and to do so on single systems, without needing to use a completely separate venue (which is still a mythical thing for some to set up and use). So two new settings were introduced, rather rapidly behind each other. <ncpus>0</ncpus> would set zero CPUs. <ncpus>-1</ncpus> would disable the flag all-together. For those lazy people who can't delete the line from their cc_config.xml file. > We already have prefs for limiting CPU use. > If they don't provide the necessary level of control, > we should change them so that they do. I know you can set on the project level, through venues, whether the computer(s) should use both the CPU and the GPU, or none of either, but you can't set that on the local level. I know that intuitively it goes against nature to tell BOINC it cannot use the CPU, but it appears some are able to do that with the <ncpus>0</ncpus> setting at this time and still turning out normal work on their GPUs only. How else does GPUGrid do it, where they do not have a true CPU app? > -- David > > Jorden van der Elst wrote: >> >> On Tue, Feb 9, 2010 at 8:53 PM, David Anderson <[email protected]> >> wrote: >>> >>> There are no GPU-only apps. >>> They all use some CPU. >>> I guess you could say use at most 1% of processors >>> (although that would still allow a 1-CPU app to run) >> >> The <ncpus> flag is there foremost run more work than you actually >> have CPUs in your system. The focus of this function should go back to >> that flag. Now its use is confusing, while it's also being adopted by >> some who think they know better than you, to set up their strict >> amount of CPUs, instead of the "On multiprocessors" preference >> setting. >> >> It's also used by some to tell BOINC to primarily focus on using all >> GPUs in the system and (neigh on) no CPUs. Even with <ncpus>0<ncpus> >> set at this time (no CPUs), BOINC will use part of one CPU to cater >> for the GPUs. It'll be able to do so since the GPU apps will always be >> started by a CPU, but the CPU doesn't do much more than translate the >> task to kernels and transfer that to the GPU's memory banks, plus >> write whatever their outcome is back to disk. >> >> Whether or not it gives the divide by zero problem Richard came >> across, is something that needs to be tested. Nothing against Richard, >> but he only saw it on one of his systems and he's the only one who saw >> it thus far and only in the latest BOINC. Is he the only tester left >> to BOINC? Can he reproduce that same error with all the previous BOINC >> versions? Can it have been a fluke? >> >> Thus far the only problem we've ever seen with <ncpus>0<ncpus> is >> people complaining that BOINC stopped using their CPUs after upgrading >> from BOINC 6.2 to BOINC 6.4 or above. It's possible that some Linux >> distros come with a BOINC version with a pre-installed cc_config.xml >> with this flag set to zero as well. But that needs investigation. >> >> Does that give evidence the divide by zero error was never there? No. >> But it doesn't give conclusive evidence that it was there either. For >> all we know it was a cosmic ray that hit Richard's diskplatter in the >> exact position where that entry was for his client_state.xml file. ;-) >> >> I have seen (very localized) data transfer and disk-writing corruption >> do strange things to entries in the client_state.xml file, while not >> otherwise corrupting the file itself. Perhaps that we need to have a >> better check at BOINC start-up if all the contents of the present >> client_state.xml file are somewhat the same as the ones in the last >> backup in client_state_prev. xml that we made? Now we write too >> quickly to the backup file, without a sanity check, thereby possibly >> corrupting both. >> >> Back to the <ncpus> flag and its meaning. As I told you and Rom in >> private, you changed using the how BOINC would recognize that the >> service installation was used between BOINC 5 and 6 by going from >> ENABLEPROTECTEDAPPLICATIONEXECUTION to >> ENABLEPROTECTEDAPPLICATIONEXECUTION2, with BOINC 6 ignoring the >> ENABLEPROTECTEDAPPLICATIONEXECUTION entry in the registry. >> >> Perhaps that you need something similar for test flags that seem to >> over-complicate things at this time. How can you easily reset their >> use? By making them anew. Most all flags say in a way what they do, >> like <memtest_debug> is used for memory debug tests, while >> <sched_op_debug> is used for scheduler debug operations. So why can't >> we rename <ncpus> to something that immediately tells us what its use >> is, plus be using that one from the next BOINC major version onwards, >> it ignoring the previous entry? >> E.g. <test_only_ncpus>, <test_nr_cpus> or something similar. >> >> Sorry for the wall of text. I had to be thorough in expressing my views. >> > > -- -- Jord. _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
