[boinc_dev] Runtime estimation for new applications - again

Richard Haselgrove Wed, 09 Jan 2013 14:37:24 -0800

I'm participating in the testing of the new N-body application from MilkyWay, 
as discussed in the news items on their home page 
http://milkyway.cs.rpi.edu/milkyway/


This is still very much a work in progress, and I think the developers are 
concentrating on their own application code: I'm interested in watching how 
well BOINC services the frequent deployment of new versions.

The new application is intended, in due course, to run multi-threaded: I'm 
testing the Windows version only, which for the time being reports:
Using OpenMP 1 max threads on a system with 8 processors
Our Linux friends are having separate issues with multithreading, but for my 
tests the app is behaving as a normal single-threaded CPU-only application, and 
no <plan_class> has been specified in the application deployment. Plain vanilla 
throughout.

My test host is 
http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=465695

>From the application details link, you can see that I have completed 139 tasks 
>with the v1.04 application, at an Average Processing Rate of 1696.960069067: 
>my i7 is good, but not that good.

I suspect the APR has been inflated by an early succession of very 
short-running tasks - runtimes for this application have varied between about 
one minute and 42 hours (many of the shortest ones have been purged from the 
server record already). 

The high APR has an effect on the client estimation of runtimes. My most 
recently-reported task, the 42 hour runtime, was issued with <rsc_fpops_est> 
480,966,000,000,000,000 - that's half a zetta-fpop. At the APR speed, that 
would be an estimated runtime of 78.73 hours - a trifle high, but perfectly 
reasonable.

However, the server didn't tell the client about the true APR value - instead, 
it passed down a <flops> value of 44,300,438,584: 44.3 GHz is still high for a 
CPU, but bears no relation to the APR.

It turns out to be precisely (to 16 significant digits) 10 times the current 
whetstone benchmark. But it leads the client to make a runtime estimation of

480,966,000,000,000,000 / 44,300,438,584 / 3600 = 3015.8 hours, or 125 days - 
difficult to achieve within a 12-day deadline.


It seems to me that the server speaks with forked tongue, using the raw 
unadjusted APR when assessing workunit sizes, but telling the client that it 
should use something different. That disrupts job scheduling of other projects 
on the host, by invoking High Priority running - as shown in the screenshot I 
posted in the project's discussion thread on this application, 
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3102&nowrap=true#56784

I don't know exactly how current the Milkyway server code is (and have no easy 
way of finding out, now that the SVN numbers have been removed from both the 
server status page and the user web pages): but Travis does mention updating 
both the database and the server daemons on 22 December 2012 (front page), so 
I'm guessing it's pretty close to current trunk. The project is running 
(thankfully, in this case) <dont_use_dcf>, which also points to recent code.

I've said it before, but I'll say it again: I really think that the whole BOINC 
client-server ensemble needs some serious robustification to catch and protect 
against outlandish values. I'll defer to Josef Segur on this (he's a better 
codewalker than I am), but I think he's posted before that once APR reaches the 
stratosphere like this, there's no automated soft landing of the sort that was 
built into the older BOINC code before CreditNew.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

[boinc_dev] Runtime estimation for new applications - again

Reply via email to