There are two cases where we do not wait for a checkpoint when
time-slicing.

1)  When a task is being pre-empted by a high priority task.
2)  When a task is being swapped out because of a multi threaded task can
get swapped in.  (OK, we wait for one of the tasks we are pre-empting).

There are some projects that can write a checkpoint file every few seconds
to a minute or two.  If BOINC set the "OK to checkpoint" flag and then
waited for a short time we might get more projects to checkpoint.  How much
time might depend on which event is occurring.  The rule of thumb is that
UI lag of more than 7 seconds is usually unacceptable unless there is some
warning and possibly some way out.  Some of the operations have to happen
instantly (hibernation).  And some there can be a delay.

I would propose that the following wait times might be appropriate:
1)  When swapping out for a high priority task.  Up to 1 minute.
2)  When swapping out for a multi threaded task.  Up to the checkpoint
interval.  Suspend each task as it checkpoints.
3)  Hibernation.  Instant.
4)  Shutting down.  I believe we currently wait for 30 seconds.  If so, we
could set the checkpoint flag at the beginning of the 30 seconds.
5)  User suspending activity.  7 seconds.

These will catch a different number of processes in each case.
jm7


|------------>
| From:      |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |David Anderson <[email protected]>                                      
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |<[email protected]>                                                 
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |07/16/2012 04:55 PM                                                          
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: [boinc_dev] [boinc_alpha] Tasks resume with same fraction done           
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Sent by:   |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |<[email protected]>                                         
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|





Many (most?) applications can checkpoint only at specific moments
(e.g. completion of an outer loop) that may occur only every few minutes.

When a job is ready to be preempted because of time-slicing,
the scheduler waits until it checkpoints.
So that's not an issue.

The other cases are preempting because the user suspended activity,
the client is exiting, or the system is hibernating.
We could add a mechanism to request apps to checkpoint then,
but it would benefit only those apps that can checkpoint at any time.

-- David

On 16-Jul-2012 12:45 PM, Jon Sonntag wrote:
> Shouldn't time_to_checkpoint return true prior to BOINC suspending the
task?
> Then, only after checkpoint_completed is set, actually suspend the task?
> Or, is it up to the application to do a checkpoint by checking the
> boinc_status and doing a checkpoint even it not asked to do so when BOINC
is
> suspending the task?   Otherwise, a couple minutes of work would be lost
on
> average every time it suspends.  (5 minutes per checkpoint and switching
> between 2 projects every hour = 2.5 minutes lost on average.  For long
> tasks, that could add up to a lot of time.) If it is the developers job
to
> checkpoint on suspend, I would suggest adding that code to the sample
apps
> as startup projects often use the uppercase sample apps as a template for
> their own code.
>
> Note: This topic started on BOINC_ALPHA, but I felt BOINC_DEV was a more
> appropriate place to get more clarification and/or expand the discussion.
>
> Jon Sonntag
> [email protected]
>
>> -----Original Message-----
>> From: [email protected] [mailto:boinc_alpha-
>> [email protected]] On Behalf Of David Anderson
>> Sent: Monday, July 16, 2012 2:28 PM
>> To: [email protected]
>> Subject: Re: [boinc_alpha] Tasks resume with same fraction done
>>
>> Tasks resume from wherever they last checkpointed.
>> This is true whether you install a new version, or simply stop/start the
> client.
>> -- David
>>
>> On 16-Jul-2012 7:21 AM, Jan Pillár wrote:
>>> Hi,
>>>
>>>
>>> I have a question about installing new version of BOINC over older one
>>> - tasks should resume with same fraction done. Should the tasks start
>>> with exactly the same fraction done or is there an acceptable
tolerance?
>>>
>>> For example, before installation of new version my tasks were at 34 %,
>>> after installation they were at 31 %. Is that OK? Does it have
>>> anything to do with "Task checkpoint to disk" settings?
>>>
>>> Kind Regards,
>>>
>>> Jan _______________________________________________
>> boinc_alpha
>>> mailing list [email protected]
>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha To
>>> unsubscribe, visit the above URL and (near bottom of page) enter your
>> email address.
>>>
>>
>> _______________________________________________
>> boinc_alpha mailing list
>> [email protected]
>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha
>> To unsubscribe, visit the above URL and
>> (near bottom of page) enter your email address.
>
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
>

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.



_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to