Except that the BOINC SML parser will not cope with this as it does not do attributes.
From: Jon Sonntag [mailto:[email protected]] Sent: Thursday, April 11, 2013 5:23 PM To: McLeod, John Cc: [email protected] Subject: Re: [boinc_dev] Improved deadline calculation. I would vote against parallel arrays since that strays from the purpose of representing data as XML. Instead, to save space on disk, use attributes rather than elements: <task_report_duration mean="40000" stddev="15000"> <values> <val d="date/time" v="time duration"/> <val d="date/time" v="time duration"/> </values> </task_report_duration> In addition, the state would need to be checked when data is deleted so those with errors or aborted by user or the server wouldn't be taken into account. On Wed, Apr 10, 2013 at 8:15 AM, McLeod, John <[email protected]<mailto:[email protected]>> wrote: I was looking at the list of tasks and saw the one: "Keep track of the statistics of how long it takes to upload files, and to report results. Use that info to improve compute deadlines (e.g., subtract the 2 sigma point for both)." I believe that this can be simplified somewhat. What the client cares about is the time from task completion to the end of the report per project. That measure means that the client does not care if the upload is slow, if uploads have to retry many times, if the report is slow, the client is not connected to the internet for a few days, or if the project went offline for a week. It would have to be done per project though as the different projects can behave very differently for upload speeds, and times offline. The largest question is how long do we keep statistics for each project. The fact that a task was offline for a month 5 years ago is probably meaningless. Losing the fact that every 8 weeks a project is offline for a week is probably less than useful as well. I would propose keeping 3 months worth of data for each project or 10 completion to report durations (whichever is greater). I would propose re-calculating the average and stdev only when new data is added or deleted. Adding new data would happen every time a task is completed. Deleting data would happen only when data is added, and when the client is shutting down (or if it makes more sense, when it is starting up). SO the data that would be stored would be: <task_report_duration> <mean>40000</mean> <stdev>15000</mean> <values> <val> <d>date/time</d> <v>time duration</v> </val> <val> <d>date/time</d> <v>time duration</v> </val> ... </values> </task_report_duration> Or we could save the values and dates as 2 parallel arrays to save space on disk. <values>duration; duration; ...</values> <dates>date/time; date/time; ...</dates> I remember doing osme work on this a few years ago, and having it rejected because I did not clear it first. _______________________________________________ boinc_dev mailing list [email protected]<mailto:[email protected]> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address. _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
