I wonder about this idly while I'm waiting for a (single theorem, usually) minimization to run. So although I don't have as much reason to know as someone planning large scale minimization runs, thank you for posting these results.
On December 31, 2019 3:47:00 AM PST, savask <[email protected]> wrote: >A few days ago I stumbled upon a topic "Formula wanted to estimate >'minimize' run times ><https://groups.google.com/forum/#!topic/metamath/sODHzKvNsgs>" by >Norman >where he suggested an interesting problem of finding an approximate >formula >for minimization run time based on proof length and label number. It >would >allow (in theory) to schedule minimization jobs more evenly which must >make >a difference when performing full set.mm minimizations. > >I decided to give this task a shot, but in the end results turned out >to be >more dull than I had hoped for. I think it's still worth reporting on >these >finds as maybe someone with better data analysis skills will get >interested >in the problem. So, my procedure was as follows. > >First of all, I prepared the data as described in the original topic by > >Norman. Basically, I took run times from the minimization log by Dylan >Houlihan and augmented them by label numbers and proof sizes, getting >the >total of 28438 items (a few had to be removed since, for instance, >4exdistrOLD wasn't present in the set.mm version linked in the post). >Label >numbers in mathboxes were shifted up by 1070 (as suggested in the >original >post) to compensate for the fact that 'minimize' doesn't scan mathboxes >by >default. > >Perhaps not surprisingly, the majority of theorems (~76%) take less >than 15 >seconds to minimize. The mean minimization time is ~45s and standard >deviation is ~228, so already the "formula" predicting 0s for each >theorem >gives us a "small" mean error :-) Intuitively it seems that the formula > >ought to look like c*label^a*size^b, where a, b and c are some >constants. I >have tried various curve-fitting techniques available in Wolfram >Mathematica, for instance, it suggested the following: >0.0000811892*label^0.405776*size^1.33858, which has mean error ~35 and >standard deviation ~155. A simpler hand-found formula >4.33e-9*label*size^1.7 produces comparable mean ~25 and standard >deviation >~180. > >These formulae have a small "global" coefficient (~1e-9 in the latter >example) which suggests that perhaps one should differentiate between >the >small run time and large run time cases. I set the small class to be >theorems taking less than 15s to minimize and trained a basic linear >classifier to distinguish it from the "large time" class. I >curve-fitted a >formula of the form c*label^a*size^b separately in the small and large >cases to obtain the following prediction formula: > >If -0.0572153 + 2.68348e-6*label + 0.000206234*size < 0.5 then >4.92e-6*label^0.68*size^1.03 else 0.000479*label^0.32*size^1.25, > >which has mean error ~28 and standard deviation ~156. Clearly it's not >a >large improvement over simpler formulae listed above. > >That is more or less what I have for approximating minimization run >times. >Now, the intended application of such an approximation is scheduling >minimization jobs evenly, so lets switch our attention to that. > >I assumed that we have 16 CPUs (processes, threads, etc) available, and >we >schedule our theorems for minimization by the Longest Processing Time >algorithm (the task is assigned to a CPU with smallest total load). >Obviously during the scheduling phase we calculate time by one of our >formulae, but after theorems have been spread to CPUs, we can evaluate >our >procedure by calculating real run times. The relevant parameters are >average CPU load which is always ~79495s, and latency which I define to >be >the difference between maximum CPU time and minimum CPU time (the >smaller >latency is, the closer CPU loads are). It is useful to compare our >results >with a straightforward scheduling approach where one randomly >partitions >theorems into 16 (almost) equal lists. In my experiments such an >approach >yields latency ~39397, so it can be taken as a base value. > >So, the first formula (together with LPT scheduling) produces the >following >list of CPU run times: > >[76126,83042,81573,90121,72944,79387,74600,84677,78886,69388,86309,83830,69727,74918,81779,84567], > >has latency ~20733 and standard deviation ~6055. The second >(hand-found) >formula gives latency ~32342 and standard deviation ~8781, which is an >arguably worse result (the latency is not much different from the >random >approach). Finally, the last (large) formula gives latency ~40756 and >standard deviation ~9684, which is worse than a random approach. > >What is, in my opinion, surprising is that the simpler formula 'size' >(i.e. >minimization time is proportional to the proof size; notice that we >don't >need a multiplicative constant since we are only summing and comparing >values in our scheduling algorithm) produces latency ~19218 and >standard >deviation ~6388 which is on par with the first formula found by >curve-fitting. I have tried other simple expressions without great >success, >so this seems like the best result I can get. > >That is how far I could get with my basic knowledge of statistics. >Apparently the end result is that one can simply estimate theorem >minimization time by its proof size and then use well-known scheduling >algorithms to assign jobs :-) I couldn't find any reports on the >minimization run performed by Dylan Houlihan (for instance, I don't >know >the "latency" parameter for that run or even the number of threads >used), >so I don't know whether the approach I'm suggesting has any benefits >over >the method used by Norman and Dylan. Nevertheless, I hope this small >report >was interesting to read and maybe it will motivate other people to try >to >come up with a better formula and/or approach. Here's the dataset I >used: >https://gist.github.com/savask/900161e9d093a5b787b6473113dfde16. I'm >also >looking forward to any questions or criticism regarding approaches used > >here. I can't boast a good knowledge of statistics or data science and >my >"analysis" must be taken with a grain of salt, so I will be happy to >hear >opinions of other people. > >-- >You received this message because you are subscribed to the Google >Groups "Metamath" group. >To unsubscribe from this group and stop receiving emails from it, send >an email to [email protected]. >To view this discussion on the web visit >https://groups.google.com/d/msgid/metamath/e1867625-9c82-4e9d-9721-962590b6bd2b%40googlegroups.com. -- You received this message because you are subscribed to the Google Groups "Metamath" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/metamath/838F9EFB-5819-4467-8C6E-0E3DCE159754%40panix.com.
