I would vote against parallel arrays since that strays from the purpose of
representing data as XML. Instead, to save space on disk, use attributes
rather than elements:
<task_report_duration mean="40000" stddev="15000">
<values>
<val d="date/time" v="time duration"/>
<val d="date/time" v="time duration"/>
</values>
</task_report_duration>
In addition, the state would need to be checked when data is deleted so
those with errors or aborted by user or the server wouldn't be taken into
account.
On Wed, Apr 10, 2013 at 8:15 AM, McLeod, John <[email protected]> wrote:
> I was looking at the list of tasks and saw the one: "Keep track of the
> statistics of how long it takes to upload files, and to report results. Use
> that info to improve compute deadlines (e.g., subtract the 2 sigma point
> for both)."
> I believe that this can be simplified somewhat. What the client cares
> about is the time from task completion to the end of the report per
> project. That measure means that the client does not care if the upload is
> slow, if uploads have to retry many times, if the report is slow, the
> client is not connected to the internet for a few days, or if the project
> went offline for a week. It would have to be done per project though as
> the different projects can behave very differently for upload speeds, and
> times offline.
> The largest question is how long do we keep statistics for each project.
> The fact that a task was offline for a month 5 years ago is probably
> meaningless. Losing the fact that every 8 weeks a project is offline for a
> week is probably less than useful as well.
> I would propose keeping 3 months worth of data for each project or 10
> completion to report durations (whichever is greater). I would propose
> re-calculating the average and stdev only when new data is added or
> deleted. Adding new data would happen every time a task is completed.
> Deleting data would happen only when data is added, and when the client is
> shutting down (or if it makes more sense, when it is starting up). SO the
> data that would be stored would be:
> <task_report_duration>
> <mean>40000</mean>
> <stdev>15000</mean>
> <values>
> <val>
> <d>date/time</d>
> <v>time duration</v>
> </val>
> <val>
> <d>date/time</d>
> <v>time duration</v>
> </val>
> ...
> </values>
> </task_report_duration>
> Or we could save the values and dates as 2 parallel arrays to save space
> on disk.
> <values>duration; duration; ...</values>
> <dates>date/time; date/time; ...</dates>
> I remember doing osme work on this a few years ago, and having it rejected
> because I did not clear it first.
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
>
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.