Re: [boinc_dev] Improved deadline calculation.

McLeod, John Fri, 12 Apr 2013 05:25:40 -0700

Except that the BOINC SML parser will not cope with this as it does not do 
attributes.


From: Jon Sonntag [mailto:[email protected]]
Sent: Thursday, April 11, 2013 5:23 PM
To: McLeod, John
Cc: [email protected]
Subject: Re: [boinc_dev] Improved deadline calculation.

I would vote against parallel arrays since that strays from the purpose of 
representing data as XML.  Instead, to save space on disk, use attributes 
rather than elements:

<task_report_duration mean="40000" stddev="15000">
  <values>
    <val d="date/time" v="time duration"/>
    <val d="date/time" v="time duration"/>
  </values>
</task_report_duration>

In addition, the state would need to be checked when data is deleted so those 
with errors or aborted by user or the server wouldn't be taken into account.

On Wed, Apr 10, 2013 at 8:15 AM, McLeod, John 
<[email protected]<mailto:[email protected]>> wrote:
I was looking at the list of tasks and saw the one:  "Keep track of the 
statistics of how long it takes to upload files, and to report results. Use 
that info to improve compute deadlines (e.g., subtract the 2 sigma point for 
both)."
I believe that this can be simplified somewhat.  What the client cares about is 
the time from task completion to the end of the report per project.  That 
measure means that the client does not care if the upload is slow, if uploads 
have to retry many times, if the report is slow, the client is not connected to 
the internet for a few days, or if the project went offline for a week.  It 
would have to be done per project though as the different projects can behave 
very differently for upload speeds, and times offline.
The largest question is how long do we keep statistics for each project.  The 
fact that a task was offline for a month 5 years ago is probably meaningless.  
Losing the fact that every 8 weeks a project is offline for a week is probably 
less than useful as well.
I would propose keeping 3 months worth of data for each project or 10 
completion to report durations (whichever is greater).  I would propose 
re-calculating the average and stdev only when new data is added or deleted.  
Adding new data would happen every time a task is completed.  Deleting data 
would happen only when data is added, and when the client is shutting down (or 
if it makes more sense, when it is starting up).  SO the data that would be 
stored would be:
<task_report_duration>
          <mean>40000</mean>
          <stdev>15000</mean>
          <values>
                   <val>
                             <d>date/time</d>
                             <v>time duration</v>
                   </val>
                   <val>
                             <d>date/time</d>
                             <v>time duration</v>
                   </val>
...
          </values>
</task_report_duration>
Or we could save the values and dates as 2 parallel arrays to save space on 
disk.
          <values>duration; duration; ...</values>
          <dates>date/time; date/time; ...</dates>
I remember doing osme work on this a few years ago, and having it rejected 
because I did not clear it first.
_______________________________________________
boinc_dev mailing list
[email protected]<mailto:[email protected]>
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] Improved deadline calculation.

Reply via email to