And my HD7770 is getting the following at Albert because it hasn't finished 
it's 11 validations for it's app_version yet:
 
2014-06-05 09:56:29.7913 [PID=7201 ]    [version] looking for version of 
einsteinbinary_BRP4G
 2014-06-05 09:56:29.7913 [PID=7201 ]    [version] Checking plan class 
'BRP4G-opencl-ati'
 2014-06-05 09:56:29.7913 [PID=7201 ]    [version] plan_class_spec: parsed 
project prefs setting 'gpu_util_brp' : true : 1.000000
 2014-06-05 09:56:29.7913 [PID=7201 ]    [version] [AV#721] (BRP4G-opencl-ati) 
adjusting projected flops based on PFC avg: 34968.78G
 2014-06-05 09:56:29.7913 [PID=7201 ]    [version] Best app version is now 
AV721 (18620.28 GFLOP)
 2014-06-05 09:56:29.7913 [PID=7201 ]    [version] [AV#721] (BRP4G-opencl-ati) 
adjusting projected flops based on PFC avg: 34968.78G
 2014-06-05 09:56:29.7914 [PID=7201 ]    [version] Best version of app 
einsteinbinary_BRP4G is [AV#721] (34968.78 GFLOPS)
 
 2014-06-05 09:56:29.7974 [PID=7201 ]    [send] Sending app_version 
einsteinbinary_BRP4G 7 134 BRP4G-opencl-ati; projected 34968.78 GFLOPS
 2014-06-05 09:56:29.7976 [PID=7201 ]    [send] est. duration for WU 606407: 
unscaled 8.01 scaled 10.96
 2014-06-05 09:56:29.7976 [PID=7201 ]    [send] [HOST#8143] sending 
[RESULT#1454943 p2030.20131124.G176.16-01.04.S.b2s0g0.00000_3616_1] (est. dur. 
10.96s (0h00m10s95)) (max time 160.14s (0h02m40s14))

Real duration is going to be something like an hour, and not the 11 seconds it 
expects it to be done in!!
 
https://albert.phys.uwm.edu/results.php?hostid=8143&offset=0&show_names=0&state=5&appid=29
 
Claggy


 
> Date: Sat, 7 Jun 2014 10:51:16 +0100
> From: [email protected]
> To: [email protected]
> Subject: [boinc_dev] EXIT_TIME_LIMIT_EXCEEDED (sorry, yes me again,   but 
> please read)
> 
> And bad form, with two separate issues to report. Sorry again.
> 
> 1) Use of outlier detection to avoid skewed averages
> 2) Initial runtime estimates on the Android platform
> 
> 1) Outlier detection.
> 
> This arises from the recent introduction of a new app_version at the 
> LHCclassic project. LHC, by its very nature, is searching for the onset of 
> chaotic orbital behaviour in the simulated particle beam: they expect, and 
> actively want, many tasks to finish early.
> 
> Eric Mcintosh commented in a recent 'lessons learned' news item - 
> http://lhcathomeclassic.cern.ch/sixtrack/forum_thread.php?id=3838 - that 
> EXIT_TIME_LIMIT_EXCEEDED was his #1 problem following the new version 
> release. I've advised accordingly in that thread.
> 
> But I was surprised to find that outlier detection - an appropriate solution 
> to this particular case - wasn't documented in the developer Wiki: a 
> trac/wiki search only returns a single hit for 'outlier', and that's in 
> http://boinc.berkeley.edu/trac/wiki/ServerUpdates - which we seem to have 
> stopped updating. The one-line summary doesn't give much of a clue about when 
> and why this feature might be useful, and without a git translation the SVN 
> reference doesn't help either.
> 
> http://boinc.berkeley.edu/gitweb/?p=boinc-v2.git;a=commit;h=e49f9459080b488f85fbcf8cdad6db9672416cf8
> 
> 
> 2) Android runtime estimates
> 
> The example here is from SIMAP. During a recent pause between batches, I 
> noticed that some of my 'pending validation' tasks were being slow to clear: 
> http://boincsimap.org/boincsimap/results.php?hostid=349248
> 
> The clearest example is the third of those three workunits: 
> http://boincsimap.org/boincsimap/workunit.php?wuid=57169928
> 
> Four of the seven replications have failed with 'Error while computing', and 
> every one of those four is an EXIT_TIME_LIMIT_EXCEEDED on an Android device.
> 
> Three of the four hosts have never returned a valid result (total credit 
> zero), so they have never had a chance to establish an APR for use in runtime 
> estimation: runtime estimates and bounds must have been generated by the 
> server.
> 
> It seems - from these results, and others I've found pending on other 
> machines - that SIMAP tasks on Android are aborted with 
> EXIT_TIME_LIMIT_EXCEEDED after ~6 hours elapsed. For the new batch released 
> today, SIMAP are using a 3x bound (which may be a bit low under the 
> circumstances):
> 
>       <rsc_fpops_est>13500000000000.000000</rsc_fpops_est>
>     <rsc_fpops_bound>40500000000000.000000</rsc_fpops_bound>
> 
> so I deduce that the tasks when first issued had a runtime estimate of ~2 
> hours.
> 
> My own tasks, on a fast Intel i5 'Haswell' CPU (APR 7.34 GFLOPS), take over 
> half an hour to complete: two hours for an ARM device sounds suspiciously 
> low. The only one of my Android wingmates to have registered an APR 
> (http://boincsimap.org/boincsimap/host_app_versions.php?hostid=771033) is 
> showing 1.69 GFLOPS, but I have no way of knowing whether that APR was 
> established before or after the task in question errored out.
> 
> From experience - borne out by current tests at Albert@Home, where server 
> logs are helpfully exposed to the public - initial server estimates can be 
> hopelessly over-optimistic. These two are for the same machine:
> 
> 2014-06-04 20:28:09.8459 [PID=26529] [version] [AV#716] (BRP4G-cuda32-nv301) 
> adjusting projected flops based on PFC avg: 2124.60G
> 2014-06-07 09:30:56.1506 [PID=10808] [version] [AV#716] (BRP4G-cuda32-nv301) 
> setting projected flops based on host elapsed time avg: 23.71G
> 
> Since SIMAP have recently announced that they are leaving the BOINC platform 
> at the end of the year (despite being an Android launch partner with 
> Samsung), I doubt they'll want to put much effort into researching this issue.
> 
> But if other projects experimenting with Android applications are 
> experiencing a high task failure rate, they might like to check whether 
> EXIT_TIME_LIMIT_EXCEEDED is a significant factor in those failures, and if 
> so, consider the other remediation approaches (apart from outliers, which 
> isn't relevant in this case) that I suggested to Eric Mcintosh at LHC.
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
                                          
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to