I'm participating in the testing of the new N-body application from MilkyWay, as discussed in the news items on their home page http://milkyway.cs.rpi.edu/milkyway/
This is still very much a work in progress, and I think the developers are concentrating on their own application code: I'm interested in watching how well BOINC services the frequent deployment of new versions. The new application is intended, in due course, to run multi-threaded: I'm testing the Windows version only, which for the time being reports: Using OpenMP 1 max threads on a system with 8 processors Our Linux friends are having separate issues with multithreading, but for my tests the app is behaving as a normal single-threaded CPU-only application, and no <plan_class> has been specified in the application deployment. Plain vanilla throughout. My test host is http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=465695 >From the application details link, you can see that I have completed 139 tasks >with the v1.04 application, at an Average Processing Rate of 1696.960069067: >my i7 is good, but not that good. I suspect the APR has been inflated by an early succession of very short-running tasks - runtimes for this application have varied between about one minute and 42 hours (many of the shortest ones have been purged from the server record already). The high APR has an effect on the client estimation of runtimes. My most recently-reported task, the 42 hour runtime, was issued with <rsc_fpops_est> 480,966,000,000,000,000 - that's half a zetta-fpop. At the APR speed, that would be an estimated runtime of 78.73 hours - a trifle high, but perfectly reasonable. However, the server didn't tell the client about the true APR value - instead, it passed down a <flops> value of 44,300,438,584: 44.3 GHz is still high for a CPU, but bears no relation to the APR. It turns out to be precisely (to 16 significant digits) 10 times the current whetstone benchmark. But it leads the client to make a runtime estimation of 480,966,000,000,000,000 / 44,300,438,584 / 3600 = 3015.8 hours, or 125 days - difficult to achieve within a 12-day deadline. It seems to me that the server speaks with forked tongue, using the raw unadjusted APR when assessing workunit sizes, but telling the client that it should use something different. That disrupts job scheduling of other projects on the host, by invoking High Priority running - as shown in the screenshot I posted in the project's discussion thread on this application, http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3102&nowrap=true#56784 I don't know exactly how current the Milkyway server code is (and have no easy way of finding out, now that the SVN numbers have been removed from both the server status page and the user web pages): but Travis does mention updating both the database and the server daemons on 22 December 2012 (front page), so I'm guessing it's pretty close to current trunk. The project is running (thankfully, in this case) <dont_use_dcf>, which also points to recent code. I've said it before, but I'll say it again: I really think that the whole BOINC client-server ensemble needs some serious robustification to catch and protect against outlandish values. I'll defer to Josef Segur on this (he's a better codewalker than I am), but I think he's posted before that once APR reaches the stratosphere like this, there's no automated soft landing of the sort that was built into the older BOINC code before CreditNew. _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
