And bad form, with two separate issues to report. Sorry again. 1) Use of outlier detection to avoid skewed averages 2) Initial runtime estimates on the Android platform
1) Outlier detection. This arises from the recent introduction of a new app_version at the LHCclassic project. LHC, by its very nature, is searching for the onset of chaotic orbital behaviour in the simulated particle beam: they expect, and actively want, many tasks to finish early. Eric Mcintosh commented in a recent 'lessons learned' news item - http://lhcathomeclassic.cern.ch/sixtrack/forum_thread.php?id=3838 - that EXIT_TIME_LIMIT_EXCEEDED was his #1 problem following the new version release. I've advised accordingly in that thread. But I was surprised to find that outlier detection - an appropriate solution to this particular case - wasn't documented in the developer Wiki: a trac/wiki search only returns a single hit for 'outlier', and that's in http://boinc.berkeley.edu/trac/wiki/ServerUpdates - which we seem to have stopped updating. The one-line summary doesn't give much of a clue about when and why this feature might be useful, and without a git translation the SVN reference doesn't help either. http://boinc.berkeley.edu/gitweb/?p=boinc-v2.git;a=commit;h=e49f9459080b488f85fbcf8cdad6db9672416cf8 2) Android runtime estimates The example here is from SIMAP. During a recent pause between batches, I noticed that some of my 'pending validation' tasks were being slow to clear: http://boincsimap.org/boincsimap/results.php?hostid=349248 The clearest example is the third of those three workunits: http://boincsimap.org/boincsimap/workunit.php?wuid=57169928 Four of the seven replications have failed with 'Error while computing', and every one of those four is an EXIT_TIME_LIMIT_EXCEEDED on an Android device. Three of the four hosts have never returned a valid result (total credit zero), so they have never had a chance to establish an APR for use in runtime estimation: runtime estimates and bounds must have been generated by the server. It seems - from these results, and others I've found pending on other machines - that SIMAP tasks on Android are aborted with EXIT_TIME_LIMIT_EXCEEDED after ~6 hours elapsed. For the new batch released today, SIMAP are using a 3x bound (which may be a bit low under the circumstances): <rsc_fpops_est>13500000000000.000000</rsc_fpops_est> <rsc_fpops_bound>40500000000000.000000</rsc_fpops_bound> so I deduce that the tasks when first issued had a runtime estimate of ~2 hours. My own tasks, on a fast Intel i5 'Haswell' CPU (APR 7.34 GFLOPS), take over half an hour to complete: two hours for an ARM device sounds suspiciously low. The only one of my Android wingmates to have registered an APR (http://boincsimap.org/boincsimap/host_app_versions.php?hostid=771033) is showing 1.69 GFLOPS, but I have no way of knowing whether that APR was established before or after the task in question errored out. >From experience - borne out by current tests at Albert@Home, where server logs >are helpfully exposed to the public - initial server estimates can be >hopelessly over-optimistic. These two are for the same machine: 2014-06-04 20:28:09.8459 [PID=26529] [version] [AV#716] (BRP4G-cuda32-nv301) adjusting projected flops based on PFC avg: 2124.60G 2014-06-07 09:30:56.1506 [PID=10808] [version] [AV#716] (BRP4G-cuda32-nv301) setting projected flops based on host elapsed time avg: 23.71G Since SIMAP have recently announced that they are leaving the BOINC platform at the end of the year (despite being an Android launch partner with Samsung), I doubt they'll want to put much effort into researching this issue. But if other projects experimenting with Android applications are experiencing a high task failure rate, they might like to check whether EXIT_TIME_LIMIT_EXCEEDED is a significant factor in those failures, and if so, consider the other remediation approaches (apart from outliers, which isn't relevant in this case) that I suggested to Eric Mcintosh at LHC. _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
