Re: [boinc_dev] BOINC's Quota system needs change

Lynn W. Taylor Mon, 29 Mar 2010 09:45:15 -0700

It breaks "hybrid" applications pretty badly -- applications that do 
part of the work on the CPU and part of the work on the GPU.


The resulting simplification may be worth it -- or not.

On 3/29/2010 9:08 AM, Jim Preston wrote:
> I for one, like Lynn's approach best. I do not have a clue how much
> work would be involved, but I like the approach of being able to
> easily see how each device is performing (if this is in-fact a result
> of assign each device a host id)
>
> --------
> Jim Preston
> BOINC Support Volunteer
> [email protected]
> SKYPE jhparizona-boinc
>
>
>
>
> On Mar 28, 2010, at 1:51 PM, Lynn W. Taylor wrote:
>
>> I don't see how "pollution" justifies this much work.  Certainly the
>> "pollution" from invalid work is less important than the pollution
>> from
>> a science application that selectively kills unfavorable work.
>>
>> That said, the simplest solution may be to treat each device as a
>> separate host.  Different host ID, different queue, everything.
>>
>> Cruncher with three CUDA cards, four host ids.
>>
>> On 3/28/2010 1:21 PM, Raistmer wrote:
>>> Unfortunately, binding quota to device type (instead of device
>>> instance)
>>> will not solve current issues with multy-GPU hosts.
>>> Such hosts (or hosts with multy-core GPU) can do correct
>>> computations on one
>>> GPU (GPU core) and incorrect (for example, constantly throwing -9
>>> overflow
>>> in SETI project) ones on another.
>>> IMHO no need to implement full-scale scheduling algorithms (I
>>> suppose this
>>> thing you called modeling) per-device basis.
>>> All that would be needed is just additional field in structure that
>>> describes device.
>>> When work assigned to device BOINC knows to what particular device it
>>> assigns particular task. Then it could check (client, not server)
>>> outcome of
>>> this particular result (was computational error or not) and update
>>> corresponding field in structure for particular device.
>>> Sure, it can't catch invalid results, invalid status will be known
>>> only
>>> after validation, i.e. server should be involved.
>>> But such simplified mechanism could check computational (in
>>> particular,
>>> SETI's -9 overflow or CUDA-specific -1, not implemented) errors.
>>> Unfortunately, there are complications, overflow can be thrown for
>>> completely valid result too, but here rate of such errors could
>>> play some
>>> role...
>>> As bigger extention, BOINC client could attach additional field
>>> with device
>>> ID when reporting result to server.
>>> On next request server could tell client updated good/bad ratio for
>>> each
>>> device ID. Devices with poor good/bad ratios could be disabled for
>>> some
>>> period of time (smth like device-wide backoff in computations). Here
>>> server-side changes required, but again, no need to do full-scale
>>> scheduling
>>> on per-device basis. Actually, scheduling should not be touched at
>>> all.
>>> BOINC client could just disable/enable corresponding devices
>>> according to
>>> device good/bad ratio (this would just decrease number of devices
>>> available
>>> for scheduling, AFAIK BOINC currently should deal with same
>>> situation. For
>>> example, number of available devices changes when user starts "no-
>>> GPU" app).
>>>
>>> ----- Original Message -----
>>> From: "David Anderson"<[email protected]>
>>> To: "Raistmer"<[email protected]>
>>> Cc:<[email protected]>
>>> Sent: Sunday, March 28, 2010 11:23 PM
>>> Subject: Re: [boinc_dev] BOINC's Quota system needs change
>>>
>>>
>>>> The new system (see updated doc:
>>>> http://boinc.berkeley.edu/trac/wiki/CreditNew)
>>>> will have separate quotas and error rates per resource type
>>>> (CPU, NVIDIA, ATI).
>>>>
>>>> Maintaining these separately for each GPU would require
>>>> modeling multiple GPUs separately,
>>>> rather than as N instances of the same thing as is currently done.
>>>> This would be a sweeping change, and won't get done in the near
>>>> term.
>>>>
>>>> -- David
>>>>
>>>> Raistmer wrote:
>>>>> If hosts' task quota computed in old way, host that does valid CPU
>>>>> computations but invalid GPU ones will pollute database and waste
>>>>> project
>>>>> resource indefinitely.
>>>>> GPU usually much faster than CPU so many invalid tasks can be
>>>>> returned
>>>>> per single valid one.
>>>>> Moreover, even if CPU/GPU quota separation will be introducted,
>>>>> there are
>>>>> still multi GPU hosts that can pollute database with even bigger
>>>>> rate
>>>>> doing correct computations on one GPU and invalid ones on anothers.
>>>>> Current quota system applicable only to single host-single device
>>>>> approach and apparently should be changed.
>>>>> Right now I have no good idea what replacement can be, but this
>>>>> question
>>>>> definitely deserves consideration.
>>>>>
>>>>> One possible solution could be to track good/bad results ratio per
>>>>> hardvare device (not per host) and inhibit work fetch for whole
>>>>> host if
>>>>> one of its devices has too bad good/bad ratio. Or issue some
>>>>> instruction
>>>>> to BOINC client to block affected device from reciving work (it
>>>>> could be
>>>>> more graceful approach).
>>>>> More ideas?
>>>>>
>>>>> _______________________________________________
>>>>> boinc_dev mailing list
>>>>> [email protected]
>>>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>>>> To unsubscribe, visit the above URL and
>>>>> (near bottom of page) enter your email address.
>>>>
>>>
>>> _______________________________________________
>>> boinc_dev mailing list
>>> [email protected]
>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>> To unsubscribe, visit the above URL and
>>> (near bottom of page) enter your email address.
>>>
>> _______________________________________________
>> boinc_dev mailing list
>> [email protected]
>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>> To unsubscribe, visit the above URL and
>> (near bottom of page) enter your email address.
>
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
>
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] BOINC's Quota system needs change

Reply via email to