Confirmed working on Darwin.

Windows is still broken, i.e. 0 second CPU time:
http://ec2-23-23-126-96.compute-1.amazonaws.com/pogs/result.php?resultid=1899118.
Seems like CPU time is getting reset every time a new task starts in the
wrapper? Run time is fine though.

Note:  I use MinGW to compile on Windows. I had to rip the zipping code out
of the newest sample wrapper as I couldn't get it to compile properly. This
probably comes down to lack of motivation to track down exactly what is
needed to compile the new boinc_zip build using MinGW :).

--- Daniel

---------- Forwarded message ----------
From: Daniel Carrion <[email protected]>
Date: Sat, Jan 26, 2013 at 10:29 PM
Subject: Re: [boinc_dev] Wrapper CPU time woes
To: BOINC Developers Mailing List <[email protected]>


Confirmed working on Linux. Just need to test across rest of platforms now.

-- Daniel

On Sat, Jan 26, 2013 at 5:42 PM, David Anderson <[email protected]>wrote:

> I checked in a fix (at least, I tested it and it seemed to work).
> -- David
>
> On 25-Jan-2013 5:32 PM, Daniel Carrion wrote:
> > Just wondering if any of the boinc devs have considered this issue any
> > further? We usually use the latest wrapper at boinc/sample as it seems to
> > be receiving new features, however, if this CPU time calc problem isn't
> > going to be considered as a real issue/bug we may have to fork...
> >
> > Can someone from BOINC dev team indicate either way so I know what path
> to
> > go down with this?
> >
> > To summarise the issue again: CPU time is calculated incorrectly as
> wrapper
> > checkpoints and moves onto next tasks. It affects UNIX machines, i.e.
> > Linux, Darwin, Android, etc... Debug output showing incorrect
> > checkpoint_cpu_time calculation as tasks switch.
> >
> >
> =========================================================================================
> > $tail -f stderr.txt
> > wrapper: starting
> > 17:52:25 (9875): wrapper: running fit_sed (1 filters.dat
> observations.dat)
> > checkpoint_cpu_time = starting_cpu (0.000000) + final_cpu_time
> (447.131944)
> > 17:59:53 (9875): wrapper: running fit_sed (2 filters.dat
> observations.dat)
> > checkpoint_cpu_time = starting_cpu (447.131944) + final_cpu_time
> > (897.368082)
> > 18:07:25 (9875): wrapper: running fit_sed (3 filters.dat
> observations.dat)
> > checkpoint_cpu_time = starting_cpu (1344.500026) + final_cpu_time
> > (1350.548404)
> > 18:14:59 (9875): wrapper: running fit_sed (4 filters.dat
> observations.dat)
> >
> ==========================================================================================
> >
> > --- Daniel
> >
> > On Thu, Jan 10, 2013 at 10:06 AM, Daniel Carrion <[email protected]
> >wrote:
> >
> >> On my Linux machine:
> >>
> >> Cloned the main git repo. Compiled BOINC followed by sample wrapper.
> >> Copied wrapper over to project dir in place of existing/old wrapper -
> >> Fairly significant size difference. I'm guessing it's that zipping
> >> functionality.
> >>
> >> Unfortunately...Same problem seems to be happening. I.e.:
> >>
> >> ----------------------
> >>
> >>
> >> daniel@snm-boi01:/var/lib/boinc/slots/0# tail -f wrapper_checkpoint.txt
> >> 2>/dev/null
> >> 1 448.900054
> >> 2 1351.808482 <-- should be 904
> >> 3 2710.013364
> >> daniel@snm-boi01:/var/lib/boinc/slots/0# cat stderr.txt
> >> wrapper: starting
> >> 17:31:17 (30673): wrapper: running
> >> ../../projects/ec2-23-23-126-96.compute-1.amazonaws.com_pogs/fit_sed (1
> >> filters.dat observations.dat)
> >> 17:38:52 (30673): wrapper: running
> >> ../../projects/ec2-23-23-126-96.compute-1.amazonaws.com_pogs/fit_sed (2
> >> filters.dat observations.dat)
> >> 17:46:27 (30673): wrapper: running
> >> ../../projects/ec2-23-23-126-96.compute-1.amazonaws.com_pogs/fit_sed (3
> >> filters.dat observations.dat)
> >> 17:54:04 (30673): wrapper: running
> >> ../../projects/ec2-23-23-126-96.compute-1.amazonaws.com_pogs/fit_sed (4
> >> filters.dat observations.dat)
> >>
> >> ------------------------
> >>
> >> Notice the checkpoint times are way off the mark. E.g. 17:54:04 -
> 17:31:17
> >> != 2710 seconds. They're adding CPU time incorrectly as sub-tasks are
> >> finishing, check-pointing and moving onto next.
> >>
> >> I don't have immediate access to Windows build environment for BOINC,
> so I
> >> can't test if that "0 second" report time problem is still occurring
> with
> >> the latest wrapper. However, I'm more concerned about that incorrect CPU
> >> checkpoint time at the moment.
> >>
> >> I just want to re-emphasise that this issue does not occur with
> >> server_stable branch wrapper release.
> >>
> >> Here's some actual live runs to show you the difference between CPU time
> >> between versions:
> >>
> >> Wrong CPU time (most recent version):
> >>
> http://ec2-23-23-126-96.compute-1.amazonaws.com/pogs/result.php?resultid=1492571
> >> Right CPU time (old version and with fix):
> >>
> http://ec2-23-23-126-96.compute-1.amazonaws.com/pogs/result.php?resultid=1487356
> >>
> >>
> >> On Mon, Jan 7, 2013 at 4:07 PM, David Anderson <[email protected]
> >wrote:
> >>
> >>> The looks like an old version of wrapper.cpp.
> >>> Try the one in trunk.
> >>> -- David
> >>>
> >>> On 06-Jan-2013 7:23 PM, Daniel Carrion wrote:
> >>>> This concerns wrapper.cpp provided under
> >>> boinc/samples/wrapper/wrapper.cpp.
> >>>> Seems like we're getting wrong CPU times calculating under Linux, and
> I
> >>>> believe same goes for Mac.
> >>>>
> >>>> Section of code this concerns (as subtasks finish in main()):
> >>>>
> >>>> 804 checkpoint_cpu_time = task.starting_cpu + task.final_cpu_time;
> >>>> 805
> >>>> 806 fprintf(stderr, "checkpoint_cpu_time = starting_cpu (%f) +
> >>>> final_cpu_time (%f)\n",
> >>>> 807 task.starting_cpu, task.final_cpu_time);
> >>>> 808
> >>>> 809 write_checkpoint(i+1, checkpoint_cpu_time);
> >>>>
> >>>> Note: I added the above fprintf line for debugging.
> >>>>
> >>>> We see this in stderr.txt file as subtasks run (and checkpointed as
> they
> >>>> finish)
> >>>>
> >>>> $tail -f stderr.txt
> >>>> wrapper: starting
> >>>> 17:52:25 (9875): wrapper: running fit_sed (1 filters.dat
> >>> observations.dat)
> >>>> checkpoint_cpu_time = starting_cpu (0.000000) + final_cpu_time
> >>> (447.131944)
> >>>> 17:59:53 (9875): wrapper: running fit_sed (2 filters.dat
> >>> observations.dat)
> >>>> checkpoint_cpu_time = starting_cpu (447.131944) + final_cpu_time
> >>>> (897.368082)
> >>>> 18:07:25 (9875): wrapper: running fit_sed (3 filters.dat
> >>> observations.dat)
> >>>> checkpoint_cpu_time = starting_cpu (1344.500026) + final_cpu_time
> >>>> (1350.548404)
> >>>> 18:14:59 (9875): wrapper: running fit_sed (4 filters.dat
> >>> observations.dat)
> >>>>
> >>>> See how the final_cpu_time is causing the checkpoint_cpu_time to be
> >>>> incorrect and therefore the starting_cpu_time in the next task since
> it
> >>>> uses this value. If I change the checkpoint_cpu_time to be
> >>> final_cpu_time
> >>>> only, the problem goes away.
> >>>>
> >>>> Something else that we noticed is that the CPU time reported on
> Windows
> >>>> machines is nearly always 0.0 seconds. Not sure if this is related as
> I
> >>>> haven't looked into it further.
> >>>>
> >>>> One more thing to note, I don't see this issue on Linux with the
> wrapper
> >>>> provided at server_stable branch on old SVN repo.
> >>>>
> >>>> I'm hoping that David A. Picks this up.  Tried to keep it as short as
> >>>> possible - let me know if more details required.
> >>>> _______________________________________________
> >>>> boinc_dev mailing list
> >>>> [email protected]
> >>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> >>>> To unsubscribe, visit the above URL and
> >>>> (near bottom of page) enter your email address.
> >>>>
> >>> _______________________________________________
> >>> boinc_dev mailing list
> >>> [email protected]
> >>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> >>> To unsubscribe, visit the above URL and
> >>> (near bottom of page) enter your email address.
> >>>
> >>
> >>
> > _______________________________________________
> > boinc_dev mailing list
> > [email protected]
> > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> > To unsubscribe, visit the above URL and
> > (near bottom of page) enter your email address.
> >
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
>
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to