It breaks "hybrid" applications pretty badly -- applications that do part of the work on the CPU and part of the work on the GPU.
The resulting simplification may be worth it -- or not. On 3/29/2010 9:08 AM, Jim Preston wrote: > I for one, like Lynn's approach best. I do not have a clue how much > work would be involved, but I like the approach of being able to > easily see how each device is performing (if this is in-fact a result > of assign each device a host id) > > -------- > Jim Preston > BOINC Support Volunteer > [email protected] > SKYPE jhparizona-boinc > > > > > On Mar 28, 2010, at 1:51 PM, Lynn W. Taylor wrote: > >> I don't see how "pollution" justifies this much work. Certainly the >> "pollution" from invalid work is less important than the pollution >> from >> a science application that selectively kills unfavorable work. >> >> That said, the simplest solution may be to treat each device as a >> separate host. Different host ID, different queue, everything. >> >> Cruncher with three CUDA cards, four host ids. >> >> On 3/28/2010 1:21 PM, Raistmer wrote: >>> Unfortunately, binding quota to device type (instead of device >>> instance) >>> will not solve current issues with multy-GPU hosts. >>> Such hosts (or hosts with multy-core GPU) can do correct >>> computations on one >>> GPU (GPU core) and incorrect (for example, constantly throwing -9 >>> overflow >>> in SETI project) ones on another. >>> IMHO no need to implement full-scale scheduling algorithms (I >>> suppose this >>> thing you called modeling) per-device basis. >>> All that would be needed is just additional field in structure that >>> describes device. >>> When work assigned to device BOINC knows to what particular device it >>> assigns particular task. Then it could check (client, not server) >>> outcome of >>> this particular result (was computational error or not) and update >>> corresponding field in structure for particular device. >>> Sure, it can't catch invalid results, invalid status will be known >>> only >>> after validation, i.e. server should be involved. >>> But such simplified mechanism could check computational (in >>> particular, >>> SETI's -9 overflow or CUDA-specific -1, not implemented) errors. >>> Unfortunately, there are complications, overflow can be thrown for >>> completely valid result too, but here rate of such errors could >>> play some >>> role... >>> As bigger extention, BOINC client could attach additional field >>> with device >>> ID when reporting result to server. >>> On next request server could tell client updated good/bad ratio for >>> each >>> device ID. Devices with poor good/bad ratios could be disabled for >>> some >>> period of time (smth like device-wide backoff in computations). Here >>> server-side changes required, but again, no need to do full-scale >>> scheduling >>> on per-device basis. Actually, scheduling should not be touched at >>> all. >>> BOINC client could just disable/enable corresponding devices >>> according to >>> device good/bad ratio (this would just decrease number of devices >>> available >>> for scheduling, AFAIK BOINC currently should deal with same >>> situation. For >>> example, number of available devices changes when user starts "no- >>> GPU" app). >>> >>> ----- Original Message ----- >>> From: "David Anderson"<[email protected]> >>> To: "Raistmer"<[email protected]> >>> Cc:<[email protected]> >>> Sent: Sunday, March 28, 2010 11:23 PM >>> Subject: Re: [boinc_dev] BOINC's Quota system needs change >>> >>> >>>> The new system (see updated doc: >>>> http://boinc.berkeley.edu/trac/wiki/CreditNew) >>>> will have separate quotas and error rates per resource type >>>> (CPU, NVIDIA, ATI). >>>> >>>> Maintaining these separately for each GPU would require >>>> modeling multiple GPUs separately, >>>> rather than as N instances of the same thing as is currently done. >>>> This would be a sweeping change, and won't get done in the near >>>> term. >>>> >>>> -- David >>>> >>>> Raistmer wrote: >>>>> If hosts' task quota computed in old way, host that does valid CPU >>>>> computations but invalid GPU ones will pollute database and waste >>>>> project >>>>> resource indefinitely. >>>>> GPU usually much faster than CPU so many invalid tasks can be >>>>> returned >>>>> per single valid one. >>>>> Moreover, even if CPU/GPU quota separation will be introducted, >>>>> there are >>>>> still multi GPU hosts that can pollute database with even bigger >>>>> rate >>>>> doing correct computations on one GPU and invalid ones on anothers. >>>>> Current quota system applicable only to single host-single device >>>>> approach and apparently should be changed. >>>>> Right now I have no good idea what replacement can be, but this >>>>> question >>>>> definitely deserves consideration. >>>>> >>>>> One possible solution could be to track good/bad results ratio per >>>>> hardvare device (not per host) and inhibit work fetch for whole >>>>> host if >>>>> one of its devices has too bad good/bad ratio. Or issue some >>>>> instruction >>>>> to BOINC client to block affected device from reciving work (it >>>>> could be >>>>> more graceful approach). >>>>> More ideas? >>>>> >>>>> _______________________________________________ >>>>> boinc_dev mailing list >>>>> [email protected] >>>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >>>>> To unsubscribe, visit the above URL and >>>>> (near bottom of page) enter your email address. >>>> >>> >>> _______________________________________________ >>> boinc_dev mailing list >>> [email protected] >>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >>> To unsubscribe, visit the above URL and >>> (near bottom of page) enter your email address. >>> >> _______________________________________________ >> boinc_dev mailing list >> [email protected] >> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >> To unsubscribe, visit the above URL and >> (near bottom of page) enter your email address. > > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
