David Anderson wrote on 16/07/2009 22:24:
I checked in the following change,
which may solve the problem described below.

Afraid that didn't work David. I patched your change into my 6.6.37 build and a scheduler request for 0 seconds CPU and CUDA work was sent to CPDN on every work fetch cycle (RSC_WORK_FETCH::choose_project() decides that the project is in CPU major shortfall; see attached debug).

When I disabled work fetch for CPDN the same 0 second work request cycle started happening for CPDN Beta and WCG.

- client: change the way a resource's "estimated delay"
    (passed to server for crude deadline check) is computed.
    Old: estimated delay is the interval for which the resource
        is fully used (i.e., all instances busy).
    Problem: this may cause unnecessary project starvation.
        example: 1 CPU machine, has a month-long CPDN job
        with a 1-year deadline (it's not in deadline trouble).
        Then the CPU estimated delay will be 1 month,
        and the client won't get any work from projects
        with deadlines shorter than 1 month.
    New: estimated delay is the latest time at which the
        resource is fully used and is being used by at least 1 job
        that is projected to miss its deadline under RR.

    Note: this isn't precise, but I don't think we can improve it
    much without getting a lot more complex.

Ian Hay wrote:
Jonathan Stephenson wrote on 16/07/2009 16:36:
I've noticed something strange with the 36 and 37 version as far as work
request. When I upgraded to either version, some projects like DrugDiscovery stated that the WUs would not complete in time (BOINC on 99... and active 97 or something like that). In particular DrugDiscovery has a resource share
ten times any other right now. When I downgraded to 31 DrugDiscovery
instantly downloaded 43 WUs.

Since the "To completion" for the DrugDiscovery WUs is 1:46 minutes I doubt
the error is in the 31 version.

I've looked into at the changes between 6.6.31 and 6.6.37 and it looks like the removal of the line

    estimated_delay = 0;

from RSC_WORK_FETCH::clear_request() in changeset 18296 is responsible.

I've just self-compiled 6.6.37 after adding the line back in and I'm now downloading work from work-starved projects.

Before applying the patch a scheduler request from malariacontrol.net had

    <work_req_seconds>17280.000000</work_req_seconds>
    <cpu_req_secs>17280.000000</cpu_req_secs>
    <cpu_req_instances>0</cpu_req_instances>
    <estimated_delay>2441795.222059</estimated_delay>

After the patch it had

    <work_req_seconds>17280.000000</work_req_seconds>
    <cpu_req_secs>17280.000000</cpu_req_secs>
    <cpu_req_instances>0</cpu_req_instances>
    <estimated_delay>0.000000</estimated_delay>

The attached file has the rr_sim and work_fetch_debug messages relating to these requests.


------------------------------------------------------------------------

_______________________________________________
boinc_alpha mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.



17-Jul-2009 09:30:27 [---] [rr_sim] rr_sim start: work_buf_total 86400.00
17-Jul-2009 09:30:27 [climateprediction.net] [rr_sim] 0.00: starting 
hadcm3istd_0c10_1920_160_15936522_2
17-Jul-2009 09:30:27 [climateprediction.net] [rr_sim] 0.00: starting 
hadam3p_nehj_1989_2_006097625_3
17-Jul-2009 09:30:27 [CPDN Beta] [rr_sim] 0.00: starting 
hadcm3l_ckgx_2040_2_000013390_4
17-Jul-2009 09:30:27 [CPDN Beta] [rr_sim] 0.00: starting 
hadcm3l_cnjo_2040_2_000013434_4
17-Jul-2009 09:30:27 [malariacontrol.net] [rr_sim] 0.00: starting 
wu_670_520_4988_0_1247763163_0
17-Jul-2009 09:30:27 [World Community Grid] [rr_sim] 0.00: starting 
R00327_21ec2dc5eb08ee02b310e183d402f0e8_03_005_6
17-Jul-2009 09:30:27 [malariacontrol.net] [rr_sim] 0.00: 
wu_670_520_4988_0_1247763163_0 finishes after 11237.93 (6148.23G/0.55G)
17-Jul-2009 09:30:27 [World Community Grid] [rr_sim] 11237.93: 
R00327_21ec2dc5eb08ee02b310e183d402f0e8_03_005_6 finishes after 33000.71 
(36109.12G/1.09G)
17-Jul-2009 09:30:27 [climateprediction.net] [rr_sim] 44238.64: 
hadam3p_nehj_1989_2_006097625_3 finishes after 1172749.10 (609909.64G/0.52G)
17-Jul-2009 09:30:27 [CPDN Beta] [rr_sim] 1216987.74: 
hadcm3l_ckgx_2040_2_000013390_4 finishes after 2855807.19 (1562401.14G/0.55G)
17-Jul-2009 09:30:27 [CPDN Beta] [rr_sim] 4072794.92: 
hadcm3l_cnjo_2040_2_000013434_4 finishes after 705529.81 (771985.30G/1.09G)
17-Jul-2009 09:30:27 [climateprediction.net] [rr_sim] 4778324.74: 
hadcm3istd_0c10_1920_160_15936522_2 finishes after 306899.15 (341430.71G/1.11G)
17-Jul-2009 09:30:27 [climateprediction.net] chosen: CPU major shortfall
17-Jul-2009 09:30:27 [---] [wfd] ------- start work fetch state -------
17-Jul-2009 09:30:27 [---] [wfd] target work buffer: 43200.00 + 43200.00 sec
17-Jul-2009 09:30:27 [---] [wfd] CPU: shortfall 0.00 nidle 0.00 est. delay 0.00 
RS fetchable 700.00 runnable 700.00
17-Jul-2009 09:30:27 [climateprediction.net] [wfd] CPU: fetch share 0.29 debt 
0.00 backoff dt 0.00 int 0.00
17-Jul-2009 09:30:27 [CPDN Beta] [wfd] CPU: fetch share 0.29 debt -2662.72 
backoff dt 0.00 int 0.00
17-Jul-2009 09:30:27 [...@home] [wfd] CPU: fetch share 0.00 debt 0.00 backoff 
dt 0.00 int 0.00 (no new tasks)
17-Jul-2009 09:30:27 [malariacontrol.net] [wfd] CPU: fetch share 0.14 debt 
-26098.03 backoff dt 0.00 int 0.00
17-Jul-2009 09:30:27 [World Community Grid] [wfd] CPU: fetch share 0.29 debt 
-14623.23 backoff dt 0.00 int 0.00
17-Jul-2009 09:30:27 [climateprediction.net] [wfd] overall_debt 0
17-Jul-2009 09:30:27 [CPDN Beta] [wfd] overall_debt -2663
17-Jul-2009 09:30:27 [...@home] [wfd] overall_debt 0
17-Jul-2009 09:30:27 [malariacontrol.net] [wfd] overall_debt -26098
17-Jul-2009 09:30:27 [World Community Grid] [wfd] overall_debt -14623
17-Jul-2009 09:30:27 [---] [wfd] ------- end work fetch state -------
17-Jul-2009 09:30:27 [climateprediction.net] [wfd] request: CPU (0.00 sec, 0) 
CUDA (0.00 sec, 0)
17-Jul-2009 09:30:27 [climateprediction.net] Sending scheduler request: To 
fetch work.
17-Jul-2009 09:30:27 [climateprediction.net] Not reporting or requesting tasks
17-Jul-2009 09:30:32 [climateprediction.net] Scheduler request completed
17-Jul-2009 09:30:32 [---] [work_fetch_debug] Request work fetch: RPC complete
17-Jul-2009 09:30:37 [---] [rr_sim] rr_sim start: work_buf_total 86400.00
17-Jul-2009 09:30:37 [climateprediction.net] [rr_sim] 0.00: starting 
hadcm3istd_0c10_1920_160_15936522_2
17-Jul-2009 09:30:37 [climateprediction.net] [rr_sim] 0.00: starting 
hadam3p_nehj_1989_2_006097625_3
17-Jul-2009 09:30:37 [CPDN Beta] [rr_sim] 0.00: starting 
hadcm3l_ckgx_2040_2_000013390_4
17-Jul-2009 09:30:37 [CPDN Beta] [rr_sim] 0.00: starting 
hadcm3l_cnjo_2040_2_000013434_4
17-Jul-2009 09:30:37 [malariacontrol.net] [rr_sim] 0.00: starting 
wu_670_520_4988_0_1247763163_0
17-Jul-2009 09:30:37 [World Community Grid] [rr_sim] 0.00: starting 
R00327_21ec2dc5eb08ee02b310e183d402f0e8_03_005_6
17-Jul-2009 09:30:37 [malariacontrol.net] [rr_sim] 0.00: 
wu_670_520_4988_0_1247763163_0 finishes after 11169.30 (6110.68G/0.55G)
17-Jul-2009 09:30:37 [World Community Grid] [rr_sim] 11169.30: 
R00327_21ec2dc5eb08ee02b310e183d402f0e8_03_005_6 finishes after 33069.33 
(36184.21G/1.09G)
17-Jul-2009 09:30:37 [climateprediction.net] [rr_sim] 44238.64: 
hadam3p_nehj_1989_2_006097625_3 finishes after 1172749.05 (609909.64G/0.52G)
17-Jul-2009 09:30:37 [CPDN Beta] [rr_sim] 1216987.69: 
hadcm3l_ckgx_2040_2_000013390_4 finishes after 2855807.07 (1562401.14G/0.55G)
17-Jul-2009 09:30:37 [CPDN Beta] [rr_sim] 4072794.76: 
hadcm3l_cnjo_2040_2_000013434_4 finishes after 705529.78 (771985.30G/1.09G)
17-Jul-2009 09:30:37 [climateprediction.net] [rr_sim] 4778324.54: 
hadcm3istd_0c10_1920_160_15936522_2 finishes after 306903.00 (341435.00G/1.11G)
17-Jul-2009 09:30:37 [climateprediction.net] chosen: CPU major shortfall
17-Jul-2009 09:30:37 [---] [wfd] ------- start work fetch state -------
17-Jul-2009 09:30:37 [---] [wfd] target work buffer: 43200.00 + 43200.00 sec
17-Jul-2009 09:30:37 [---] [wfd] CPU: shortfall 0.00 nidle 0.00 est. delay 0.00 
RS fetchable 700.00 runnable 700.00
17-Jul-2009 09:30:37 [climateprediction.net] [wfd] CPU: fetch share 0.29 debt 
0.00 backoff dt 0.00 int 0.00
17-Jul-2009 09:30:37 [CPDN Beta] [wfd] CPU: fetch share 0.29 debt -2662.72 
backoff dt 0.00 int 0.00
17-Jul-2009 09:30:37 [...@home] [wfd] CPU: fetch share 0.00 debt 0.00 backoff 
dt 0.00 int 0.00 (no new tasks)
17-Jul-2009 09:30:37 [malariacontrol.net] [wfd] CPU: fetch share 0.14 debt 
-26098.03 backoff dt 0.00 int 0.00
17-Jul-2009 09:30:37 [World Community Grid] [wfd] CPU: fetch share 0.29 debt 
-14623.23 backoff dt 0.00 int 0.00
17-Jul-2009 09:30:37 [climateprediction.net] [wfd] overall_debt 0
17-Jul-2009 09:30:37 [CPDN Beta] [wfd] overall_debt -2663
17-Jul-2009 09:30:37 [...@home] [wfd] overall_debt 0
17-Jul-2009 09:30:37 [malariacontrol.net] [wfd] overall_debt -26098
17-Jul-2009 09:30:37 [World Community Grid] [wfd] overall_debt -14623
17-Jul-2009 09:30:37 [---] [wfd] ------- end work fetch state -------
17-Jul-2009 09:30:37 [climateprediction.net] [wfd] request: CPU (0.00 sec, 0) 
CUDA (0.00 sec, 0)
17-Jul-2009 09:30:37 [climateprediction.net] Sending scheduler request: To 
fetch work.
17-Jul-2009 09:30:37 [climateprediction.net] Not reporting or requesting tasks
17-Jul-2009 09:30:42 [climateprediction.net] Scheduler request completed
17-Jul-2009 09:30:42 [---] [work_fetch_debug] Request work fetch: RPC complete
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to