I for one, like Lynn's approach best. I do not have a clue how much  
work would be involved, but I like the approach of being able to  
easily see how each device is performing (if this is in-fact a result  
of assign each device a host id)

--------
Jim Preston
BOINC Support Volunteer
[email protected]
SKYPE jhparizona-boinc




On Mar 28, 2010, at 1:51 PM, Lynn W. Taylor wrote:

> I don't see how "pollution" justifies this much work.  Certainly the
> "pollution" from invalid work is less important than the pollution  
> from
> a science application that selectively kills unfavorable work.
>
> That said, the simplest solution may be to treat each device as a
> separate host.  Different host ID, different queue, everything.
>
> Cruncher with three CUDA cards, four host ids.
>
> On 3/28/2010 1:21 PM, Raistmer wrote:
>> Unfortunately, binding quota to device type (instead of device  
>> instance)
>> will not solve current issues with multy-GPU hosts.
>> Such hosts (or hosts with multy-core GPU) can do correct  
>> computations on one
>> GPU (GPU core) and incorrect (for example, constantly throwing -9  
>> overflow
>> in SETI project) ones on another.
>> IMHO no need to implement full-scale scheduling algorithms (I  
>> suppose this
>> thing you called modeling) per-device basis.
>> All that would be needed is just additional field in structure that
>> describes device.
>> When work assigned to device BOINC knows to what particular device it
>> assigns particular task. Then it could check (client, not server)  
>> outcome of
>> this particular result (was computational error or not) and update
>> corresponding field in structure for particular device.
>> Sure, it can't catch invalid results, invalid status will be known  
>> only
>> after validation, i.e. server should be involved.
>> But such simplified mechanism could check computational (in  
>> particular,
>> SETI's -9 overflow or CUDA-specific -1, not implemented) errors.
>> Unfortunately, there are complications, overflow can be thrown for
>> completely valid result too, but here rate of such errors could  
>> play some
>> role...
>> As bigger extention, BOINC client could attach additional field  
>> with device
>> ID when reporting result to server.
>> On next request server could tell client updated good/bad ratio for  
>> each
>> device ID. Devices with poor good/bad ratios could be disabled for  
>> some
>> period of time (smth like device-wide backoff in computations). Here
>> server-side changes required, but again, no need to do full-scale  
>> scheduling
>> on per-device basis. Actually, scheduling should not be touched at  
>> all.
>> BOINC client could just disable/enable corresponding devices  
>> according to
>> device good/bad ratio (this would just decrease number of devices  
>> available
>> for scheduling, AFAIK BOINC currently should deal with same  
>> situation. For
>> example, number of available devices changes when user starts "no- 
>> GPU" app).
>>
>> ----- Original Message -----
>> From: "David Anderson"<[email protected]>
>> To: "Raistmer"<[email protected]>
>> Cc:<[email protected]>
>> Sent: Sunday, March 28, 2010 11:23 PM
>> Subject: Re: [boinc_dev] BOINC's Quota system needs change
>>
>>
>>> The new system (see updated doc:
>>> http://boinc.berkeley.edu/trac/wiki/CreditNew)
>>> will have separate quotas and error rates per resource type
>>> (CPU, NVIDIA, ATI).
>>>
>>> Maintaining these separately for each GPU would require
>>> modeling multiple GPUs separately,
>>> rather than as N instances of the same thing as is currently done.
>>> This would be a sweeping change, and won't get done in the near  
>>> term.
>>>
>>> -- David
>>>
>>> Raistmer wrote:
>>>> If hosts' task quota computed in old way, host that does valid CPU
>>>> computations but invalid GPU ones will pollute database and waste  
>>>> project
>>>> resource indefinitely.
>>>> GPU usually much faster than CPU so many invalid tasks can be  
>>>> returned
>>>> per single valid one.
>>>> Moreover, even if CPU/GPU quota separation will be introducted,  
>>>> there are
>>>> still multi GPU hosts that can pollute database with even bigger  
>>>> rate
>>>> doing correct computations on one GPU and invalid ones on anothers.
>>>> Current quota system applicable only to single host-single device
>>>> approach and apparently should be changed.
>>>> Right now I have no good idea what replacement can be, but this  
>>>> question
>>>> definitely deserves consideration.
>>>>
>>>> One possible solution could be to track good/bad results ratio per
>>>> hardvare device (not per host) and inhibit work fetch for whole  
>>>> host if
>>>> one of its devices has too bad good/bad ratio. Or issue some  
>>>> instruction
>>>> to BOINC client to block affected device from reciving work (it  
>>>> could be
>>>> more graceful approach).
>>>> More ideas?
>>>>
>>>> _______________________________________________
>>>> boinc_dev mailing list
>>>> [email protected]
>>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>>> To unsubscribe, visit the above URL and
>>>> (near bottom of page) enter your email address.
>>>
>>
>> _______________________________________________
>> boinc_dev mailing list
>> [email protected]
>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>> To unsubscribe, visit the above URL and
>> (near bottom of page) enter your email address.
>>
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to