Just wondering if any of the boinc devs have considered this issue any further? We usually use the latest wrapper at boinc/sample as it seems to be receiving new features, however, if this CPU time calc problem isn't going to be considered as a real issue/bug we may have to fork...
Can someone from BOINC dev team indicate either way so I know what path to go down with this? To summarise the issue again: CPU time is calculated incorrectly as wrapper checkpoints and moves onto next tasks. It affects UNIX machines, i.e. Linux, Darwin, Android, etc... Debug output showing incorrect checkpoint_cpu_time calculation as tasks switch. ========================================================================================= $tail -f stderr.txt wrapper: starting 17:52:25 (9875): wrapper: running fit_sed (1 filters.dat observations.dat) checkpoint_cpu_time = starting_cpu (0.000000) + final_cpu_time (447.131944) 17:59:53 (9875): wrapper: running fit_sed (2 filters.dat observations.dat) checkpoint_cpu_time = starting_cpu (447.131944) + final_cpu_time (897.368082) 18:07:25 (9875): wrapper: running fit_sed (3 filters.dat observations.dat) checkpoint_cpu_time = starting_cpu (1344.500026) + final_cpu_time (1350.548404) 18:14:59 (9875): wrapper: running fit_sed (4 filters.dat observations.dat) ========================================================================================== --- Daniel On Thu, Jan 10, 2013 at 10:06 AM, Daniel Carrion <[email protected]>wrote: > On my Linux machine: > > Cloned the main git repo. Compiled BOINC followed by sample wrapper. > Copied wrapper over to project dir in place of existing/old wrapper - > Fairly significant size difference. I'm guessing it's that zipping > functionality. > > Unfortunately...Same problem seems to be happening. I.e.: > > ---------------------- > > > daniel@snm-boi01:/var/lib/boinc/slots/0# tail -f wrapper_checkpoint.txt > 2>/dev/null > 1 448.900054 > 2 1351.808482 <-- should be 904 > 3 2710.013364 > daniel@snm-boi01:/var/lib/boinc/slots/0# cat stderr.txt > wrapper: starting > 17:31:17 (30673): wrapper: running > ../../projects/ec2-23-23-126-96.compute-1.amazonaws.com_pogs/fit_sed (1 > filters.dat observations.dat) > 17:38:52 (30673): wrapper: running > ../../projects/ec2-23-23-126-96.compute-1.amazonaws.com_pogs/fit_sed (2 > filters.dat observations.dat) > 17:46:27 (30673): wrapper: running > ../../projects/ec2-23-23-126-96.compute-1.amazonaws.com_pogs/fit_sed (3 > filters.dat observations.dat) > 17:54:04 (30673): wrapper: running > ../../projects/ec2-23-23-126-96.compute-1.amazonaws.com_pogs/fit_sed (4 > filters.dat observations.dat) > > ------------------------ > > Notice the checkpoint times are way off the mark. E.g. 17:54:04 - 17:31:17 > != 2710 seconds. They're adding CPU time incorrectly as sub-tasks are > finishing, check-pointing and moving onto next. > > I don't have immediate access to Windows build environment for BOINC, so I > can't test if that "0 second" report time problem is still occurring with > the latest wrapper. However, I'm more concerned about that incorrect CPU > checkpoint time at the moment. > > I just want to re-emphasise that this issue does not occur with > server_stable branch wrapper release. > > Here's some actual live runs to show you the difference between CPU time > between versions: > > Wrong CPU time (most recent version): > http://ec2-23-23-126-96.compute-1.amazonaws.com/pogs/result.php?resultid=1492571 > Right CPU time (old version and with fix): > http://ec2-23-23-126-96.compute-1.amazonaws.com/pogs/result.php?resultid=1487356 > > > On Mon, Jan 7, 2013 at 4:07 PM, David Anderson <[email protected]>wrote: > >> The looks like an old version of wrapper.cpp. >> Try the one in trunk. >> -- David >> >> On 06-Jan-2013 7:23 PM, Daniel Carrion wrote: >> > This concerns wrapper.cpp provided under >> boinc/samples/wrapper/wrapper.cpp. >> > Seems like we're getting wrong CPU times calculating under Linux, and I >> > believe same goes for Mac. >> > >> > Section of code this concerns (as subtasks finish in main()): >> > >> > 804 checkpoint_cpu_time = task.starting_cpu + task.final_cpu_time; >> > 805 >> > 806 fprintf(stderr, "checkpoint_cpu_time = starting_cpu (%f) + >> > final_cpu_time (%f)\n", >> > 807 task.starting_cpu, task.final_cpu_time); >> > 808 >> > 809 write_checkpoint(i+1, checkpoint_cpu_time); >> > >> > Note: I added the above fprintf line for debugging. >> > >> > We see this in stderr.txt file as subtasks run (and checkpointed as they >> > finish) >> > >> > $tail -f stderr.txt >> > wrapper: starting >> > 17:52:25 (9875): wrapper: running fit_sed (1 filters.dat >> observations.dat) >> > checkpoint_cpu_time = starting_cpu (0.000000) + final_cpu_time >> (447.131944) >> > 17:59:53 (9875): wrapper: running fit_sed (2 filters.dat >> observations.dat) >> > checkpoint_cpu_time = starting_cpu (447.131944) + final_cpu_time >> > (897.368082) >> > 18:07:25 (9875): wrapper: running fit_sed (3 filters.dat >> observations.dat) >> > checkpoint_cpu_time = starting_cpu (1344.500026) + final_cpu_time >> > (1350.548404) >> > 18:14:59 (9875): wrapper: running fit_sed (4 filters.dat >> observations.dat) >> > >> > See how the final_cpu_time is causing the checkpoint_cpu_time to be >> > incorrect and therefore the starting_cpu_time in the next task since it >> > uses this value. If I change the checkpoint_cpu_time to be >> final_cpu_time >> > only, the problem goes away. >> > >> > Something else that we noticed is that the CPU time reported on Windows >> > machines is nearly always 0.0 seconds. Not sure if this is related as I >> > haven't looked into it further. >> > >> > One more thing to note, I don't see this issue on Linux with the wrapper >> > provided at server_stable branch on old SVN repo. >> > >> > I'm hoping that David A. Picks this up. Tried to keep it as short as >> > possible - let me know if more details required. >> > _______________________________________________ >> > boinc_dev mailing list >> > [email protected] >> > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >> > To unsubscribe, visit the above URL and >> > (near bottom of page) enter your email address. >> > >> _______________________________________________ >> boinc_dev mailing list >> [email protected] >> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >> To unsubscribe, visit the above URL and >> (near bottom of page) enter your email address. >> > > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
