On 24-Sep-2011 10:29 PM, Jacob Klein wrote:
 > David,
 >
 > In order to setup the scenario, I removed my GPUGrid and Einstein tasks, then
 > let the client figure out what to do. As it went asking projects for more 
 > work,
 > GPUGrid.net got 5 GPU tasks, and then work fetch stopped asking for NVIDIA 
 > work,
 > even though none of those 5 tasks were applicable to the other 2 video cards,
 > based on <exclude_gpu> configuration.
 >

Jacob:
I fixed this (see notes below).
It was a subtle problem, and using your client state file
as input to the client simulator was key in figuring it out.
-- David

     - client: fix a bug reported by Jacob Klein,
         where work fetch didn't work right in the presence of
         multiple GPUs and <exclude_gpu> config options.
         For example: suppose:
             - you have 2 GPUs and 2 projects
             - Project A is excluded from GPU 1
             - you have lots of jobs for project A
         Then the client won't try to fetch jobs from project B.

         The problem had 2 parts:
         a) round-robin simulation wasn't taking GPU exclusions into account.
             In the above example, it would think that both GPUs had jobs.
             I fixed this by computing the # of GPUs from each project
             is excluded, and using this in the RR simulation.
         b) Once this was done, I needed to make the client
             request GPU jobs from project B rather than project A.
             I did this with following policy:
             If a project has excluded GPUs of a given type,
             and has a runnable job of that type,
             don't ask it for more work of that type.

         Notes:
         - the policy in b) is crude, and it means that work-buffer
             preferences are ignored in some cases.
         - neither a) nor b) takes into account app-level exclusions.

         I could fix both of these with a lot of work,
         but I'd rather move to a model in which dissimilar GPUs
         are modeled as different resources,
         which would remove the need for the <exclude_gpu> mechanism
         in the first place.

         Other note: I figured out this problem using the client simulator,
         based on the client state file that Jacob sent me.

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to