Confirmed working on Darwin. Windows is still broken, i.e. 0 second CPU time: http://ec2-23-23-126-96.compute-1.amazonaws.com/pogs/result.php?resultid=1899118. Seems like CPU time is getting reset every time a new task starts in the wrapper? Run time is fine though.
Note: I use MinGW to compile on Windows. I had to rip the zipping code out of the newest sample wrapper as I couldn't get it to compile properly. This probably comes down to lack of motivation to track down exactly what is needed to compile the new boinc_zip build using MinGW :). --- Daniel ---------- Forwarded message ---------- From: Daniel Carrion <[email protected]> Date: Sat, Jan 26, 2013 at 10:29 PM Subject: Re: [boinc_dev] Wrapper CPU time woes To: BOINC Developers Mailing List <[email protected]> Confirmed working on Linux. Just need to test across rest of platforms now. -- Daniel On Sat, Jan 26, 2013 at 5:42 PM, David Anderson <[email protected]>wrote: > I checked in a fix (at least, I tested it and it seemed to work). > -- David > > On 25-Jan-2013 5:32 PM, Daniel Carrion wrote: > > Just wondering if any of the boinc devs have considered this issue any > > further? We usually use the latest wrapper at boinc/sample as it seems to > > be receiving new features, however, if this CPU time calc problem isn't > > going to be considered as a real issue/bug we may have to fork... > > > > Can someone from BOINC dev team indicate either way so I know what path > to > > go down with this? > > > > To summarise the issue again: CPU time is calculated incorrectly as > wrapper > > checkpoints and moves onto next tasks. It affects UNIX machines, i.e. > > Linux, Darwin, Android, etc... Debug output showing incorrect > > checkpoint_cpu_time calculation as tasks switch. > > > > > ========================================================================================= > > $tail -f stderr.txt > > wrapper: starting > > 17:52:25 (9875): wrapper: running fit_sed (1 filters.dat > observations.dat) > > checkpoint_cpu_time = starting_cpu (0.000000) + final_cpu_time > (447.131944) > > 17:59:53 (9875): wrapper: running fit_sed (2 filters.dat > observations.dat) > > checkpoint_cpu_time = starting_cpu (447.131944) + final_cpu_time > > (897.368082) > > 18:07:25 (9875): wrapper: running fit_sed (3 filters.dat > observations.dat) > > checkpoint_cpu_time = starting_cpu (1344.500026) + final_cpu_time > > (1350.548404) > > 18:14:59 (9875): wrapper: running fit_sed (4 filters.dat > observations.dat) > > > ========================================================================================== > > > > --- Daniel > > > > On Thu, Jan 10, 2013 at 10:06 AM, Daniel Carrion <[email protected] > >wrote: > > > >> On my Linux machine: > >> > >> Cloned the main git repo. Compiled BOINC followed by sample wrapper. > >> Copied wrapper over to project dir in place of existing/old wrapper - > >> Fairly significant size difference. I'm guessing it's that zipping > >> functionality. > >> > >> Unfortunately...Same problem seems to be happening. I.e.: > >> > >> ---------------------- > >> > >> > >> daniel@snm-boi01:/var/lib/boinc/slots/0# tail -f wrapper_checkpoint.txt > >> 2>/dev/null > >> 1 448.900054 > >> 2 1351.808482 <-- should be 904 > >> 3 2710.013364 > >> daniel@snm-boi01:/var/lib/boinc/slots/0# cat stderr.txt > >> wrapper: starting > >> 17:31:17 (30673): wrapper: running > >> ../../projects/ec2-23-23-126-96.compute-1.amazonaws.com_pogs/fit_sed (1 > >> filters.dat observations.dat) > >> 17:38:52 (30673): wrapper: running > >> ../../projects/ec2-23-23-126-96.compute-1.amazonaws.com_pogs/fit_sed (2 > >> filters.dat observations.dat) > >> 17:46:27 (30673): wrapper: running > >> ../../projects/ec2-23-23-126-96.compute-1.amazonaws.com_pogs/fit_sed (3 > >> filters.dat observations.dat) > >> 17:54:04 (30673): wrapper: running > >> ../../projects/ec2-23-23-126-96.compute-1.amazonaws.com_pogs/fit_sed (4 > >> filters.dat observations.dat) > >> > >> ------------------------ > >> > >> Notice the checkpoint times are way off the mark. E.g. 17:54:04 - > 17:31:17 > >> != 2710 seconds. They're adding CPU time incorrectly as sub-tasks are > >> finishing, check-pointing and moving onto next. > >> > >> I don't have immediate access to Windows build environment for BOINC, > so I > >> can't test if that "0 second" report time problem is still occurring > with > >> the latest wrapper. However, I'm more concerned about that incorrect CPU > >> checkpoint time at the moment. > >> > >> I just want to re-emphasise that this issue does not occur with > >> server_stable branch wrapper release. > >> > >> Here's some actual live runs to show you the difference between CPU time > >> between versions: > >> > >> Wrong CPU time (most recent version): > >> > http://ec2-23-23-126-96.compute-1.amazonaws.com/pogs/result.php?resultid=1492571 > >> Right CPU time (old version and with fix): > >> > http://ec2-23-23-126-96.compute-1.amazonaws.com/pogs/result.php?resultid=1487356 > >> > >> > >> On Mon, Jan 7, 2013 at 4:07 PM, David Anderson <[email protected] > >wrote: > >> > >>> The looks like an old version of wrapper.cpp. > >>> Try the one in trunk. > >>> -- David > >>> > >>> On 06-Jan-2013 7:23 PM, Daniel Carrion wrote: > >>>> This concerns wrapper.cpp provided under > >>> boinc/samples/wrapper/wrapper.cpp. > >>>> Seems like we're getting wrong CPU times calculating under Linux, and > I > >>>> believe same goes for Mac. > >>>> > >>>> Section of code this concerns (as subtasks finish in main()): > >>>> > >>>> 804 checkpoint_cpu_time = task.starting_cpu + task.final_cpu_time; > >>>> 805 > >>>> 806 fprintf(stderr, "checkpoint_cpu_time = starting_cpu (%f) + > >>>> final_cpu_time (%f)\n", > >>>> 807 task.starting_cpu, task.final_cpu_time); > >>>> 808 > >>>> 809 write_checkpoint(i+1, checkpoint_cpu_time); > >>>> > >>>> Note: I added the above fprintf line for debugging. > >>>> > >>>> We see this in stderr.txt file as subtasks run (and checkpointed as > they > >>>> finish) > >>>> > >>>> $tail -f stderr.txt > >>>> wrapper: starting > >>>> 17:52:25 (9875): wrapper: running fit_sed (1 filters.dat > >>> observations.dat) > >>>> checkpoint_cpu_time = starting_cpu (0.000000) + final_cpu_time > >>> (447.131944) > >>>> 17:59:53 (9875): wrapper: running fit_sed (2 filters.dat > >>> observations.dat) > >>>> checkpoint_cpu_time = starting_cpu (447.131944) + final_cpu_time > >>>> (897.368082) > >>>> 18:07:25 (9875): wrapper: running fit_sed (3 filters.dat > >>> observations.dat) > >>>> checkpoint_cpu_time = starting_cpu (1344.500026) + final_cpu_time > >>>> (1350.548404) > >>>> 18:14:59 (9875): wrapper: running fit_sed (4 filters.dat > >>> observations.dat) > >>>> > >>>> See how the final_cpu_time is causing the checkpoint_cpu_time to be > >>>> incorrect and therefore the starting_cpu_time in the next task since > it > >>>> uses this value. If I change the checkpoint_cpu_time to be > >>> final_cpu_time > >>>> only, the problem goes away. > >>>> > >>>> Something else that we noticed is that the CPU time reported on > Windows > >>>> machines is nearly always 0.0 seconds. Not sure if this is related as > I > >>>> haven't looked into it further. > >>>> > >>>> One more thing to note, I don't see this issue on Linux with the > wrapper > >>>> provided at server_stable branch on old SVN repo. > >>>> > >>>> I'm hoping that David A. Picks this up. Tried to keep it as short as > >>>> possible - let me know if more details required. > >>>> _______________________________________________ > >>>> boinc_dev mailing list > >>>> [email protected] > >>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > >>>> To unsubscribe, visit the above URL and > >>>> (near bottom of page) enter your email address. > >>>> > >>> _______________________________________________ > >>> boinc_dev mailing list > >>> [email protected] > >>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > >>> To unsubscribe, visit the above URL and > >>> (near bottom of page) enter your email address. > >>> > >> > >> > > _______________________________________________ > > boinc_dev mailing list > > [email protected] > > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > > To unsubscribe, visit the above URL and > > (near bottom of page) enter your email address. > > > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
