I checked in a fix (at least, I tested it and it seemed to work).
-- David

On 25-Jan-2013 5:32 PM, Daniel Carrion wrote:
> Just wondering if any of the boinc devs have considered this issue any
> further? We usually use the latest wrapper at boinc/sample as it seems to
> be receiving new features, however, if this CPU time calc problem isn't
> going to be considered as a real issue/bug we may have to fork...
>
> Can someone from BOINC dev team indicate either way so I know what path to
> go down with this?
>
> To summarise the issue again: CPU time is calculated incorrectly as wrapper
> checkpoints and moves onto next tasks. It affects UNIX machines, i.e.
> Linux, Darwin, Android, etc... Debug output showing incorrect
> checkpoint_cpu_time calculation as tasks switch.
>
> =========================================================================================
> $tail -f stderr.txt
> wrapper: starting
> 17:52:25 (9875): wrapper: running fit_sed (1 filters.dat observations.dat)
> checkpoint_cpu_time = starting_cpu (0.000000) + final_cpu_time (447.131944)
> 17:59:53 (9875): wrapper: running fit_sed (2 filters.dat observations.dat)
> checkpoint_cpu_time = starting_cpu (447.131944) + final_cpu_time
> (897.368082)
> 18:07:25 (9875): wrapper: running fit_sed (3 filters.dat observations.dat)
> checkpoint_cpu_time = starting_cpu (1344.500026) + final_cpu_time
> (1350.548404)
> 18:14:59 (9875): wrapper: running fit_sed (4 filters.dat observations.dat)
> ==========================================================================================
>
> --- Daniel
>
> On Thu, Jan 10, 2013 at 10:06 AM, Daniel Carrion <[email protected]>wrote:
>
>> On my Linux machine:
>>
>> Cloned the main git repo. Compiled BOINC followed by sample wrapper.
>> Copied wrapper over to project dir in place of existing/old wrapper -
>> Fairly significant size difference. I'm guessing it's that zipping
>> functionality.
>>
>> Unfortunately...Same problem seems to be happening. I.e.:
>>
>> ----------------------
>>
>>
>> daniel@snm-boi01:/var/lib/boinc/slots/0# tail -f wrapper_checkpoint.txt
>> 2>/dev/null
>> 1 448.900054
>> 2 1351.808482 <-- should be 904
>> 3 2710.013364
>> daniel@snm-boi01:/var/lib/boinc/slots/0# cat stderr.txt
>> wrapper: starting
>> 17:31:17 (30673): wrapper: running
>> ../../projects/ec2-23-23-126-96.compute-1.amazonaws.com_pogs/fit_sed (1
>> filters.dat observations.dat)
>> 17:38:52 (30673): wrapper: running
>> ../../projects/ec2-23-23-126-96.compute-1.amazonaws.com_pogs/fit_sed (2
>> filters.dat observations.dat)
>> 17:46:27 (30673): wrapper: running
>> ../../projects/ec2-23-23-126-96.compute-1.amazonaws.com_pogs/fit_sed (3
>> filters.dat observations.dat)
>> 17:54:04 (30673): wrapper: running
>> ../../projects/ec2-23-23-126-96.compute-1.amazonaws.com_pogs/fit_sed (4
>> filters.dat observations.dat)
>>
>> ------------------------
>>
>> Notice the checkpoint times are way off the mark. E.g. 17:54:04 - 17:31:17
>> != 2710 seconds. They're adding CPU time incorrectly as sub-tasks are
>> finishing, check-pointing and moving onto next.
>>
>> I don't have immediate access to Windows build environment for BOINC, so I
>> can't test if that "0 second" report time problem is still occurring with
>> the latest wrapper. However, I'm more concerned about that incorrect CPU
>> checkpoint time at the moment.
>>
>> I just want to re-emphasise that this issue does not occur with
>> server_stable branch wrapper release.
>>
>> Here's some actual live runs to show you the difference between CPU time
>> between versions:
>>
>> Wrong CPU time (most recent version):
>> http://ec2-23-23-126-96.compute-1.amazonaws.com/pogs/result.php?resultid=1492571
>> Right CPU time (old version and with fix):
>> http://ec2-23-23-126-96.compute-1.amazonaws.com/pogs/result.php?resultid=1487356
>>
>>
>> On Mon, Jan 7, 2013 at 4:07 PM, David Anderson <[email protected]>wrote:
>>
>>> The looks like an old version of wrapper.cpp.
>>> Try the one in trunk.
>>> -- David
>>>
>>> On 06-Jan-2013 7:23 PM, Daniel Carrion wrote:
>>>> This concerns wrapper.cpp provided under
>>> boinc/samples/wrapper/wrapper.cpp.
>>>> Seems like we're getting wrong CPU times calculating under Linux, and I
>>>> believe same goes for Mac.
>>>>
>>>> Section of code this concerns (as subtasks finish in main()):
>>>>
>>>> 804 checkpoint_cpu_time = task.starting_cpu + task.final_cpu_time;
>>>> 805
>>>> 806 fprintf(stderr, "checkpoint_cpu_time = starting_cpu (%f) +
>>>> final_cpu_time (%f)\n",
>>>> 807 task.starting_cpu, task.final_cpu_time);
>>>> 808
>>>> 809 write_checkpoint(i+1, checkpoint_cpu_time);
>>>>
>>>> Note: I added the above fprintf line for debugging.
>>>>
>>>> We see this in stderr.txt file as subtasks run (and checkpointed as they
>>>> finish)
>>>>
>>>> $tail -f stderr.txt
>>>> wrapper: starting
>>>> 17:52:25 (9875): wrapper: running fit_sed (1 filters.dat
>>> observations.dat)
>>>> checkpoint_cpu_time = starting_cpu (0.000000) + final_cpu_time
>>> (447.131944)
>>>> 17:59:53 (9875): wrapper: running fit_sed (2 filters.dat
>>> observations.dat)
>>>> checkpoint_cpu_time = starting_cpu (447.131944) + final_cpu_time
>>>> (897.368082)
>>>> 18:07:25 (9875): wrapper: running fit_sed (3 filters.dat
>>> observations.dat)
>>>> checkpoint_cpu_time = starting_cpu (1344.500026) + final_cpu_time
>>>> (1350.548404)
>>>> 18:14:59 (9875): wrapper: running fit_sed (4 filters.dat
>>> observations.dat)
>>>>
>>>> See how the final_cpu_time is causing the checkpoint_cpu_time to be
>>>> incorrect and therefore the starting_cpu_time in the next task since it
>>>> uses this value. If I change the checkpoint_cpu_time to be
>>> final_cpu_time
>>>> only, the problem goes away.
>>>>
>>>> Something else that we noticed is that the CPU time reported on Windows
>>>> machines is nearly always 0.0 seconds. Not sure if this is related as I
>>>> haven't looked into it further.
>>>>
>>>> One more thing to note, I don't see this issue on Linux with the wrapper
>>>> provided at server_stable branch on old SVN repo.
>>>>
>>>> I'm hoping that David A. Picks this up.  Tried to keep it as short as
>>>> possible - let me know if more details required.
>>>> _______________________________________________
>>>> boinc_dev mailing list
>>>> [email protected]
>>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>>> To unsubscribe, visit the above URL and
>>>> (near bottom of page) enter your email address.
>>>>
>>> _______________________________________________
>>> boinc_dev mailing list
>>> [email protected]
>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>> To unsubscribe, visit the above URL and
>>> (near bottom of page) enter your email address.
>>>
>>
>>
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
>
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to