I wonder about this idly while I'm waiting for a (single theorem, usually) 
minimization to run. So although I don't have as much reason to know as someone 
planning large scale minimization runs, thank you for posting these results.

On December 31, 2019 3:47:00 AM PST, savask <[email protected]> wrote:
>A few days ago I stumbled upon a topic "Formula wanted to estimate 
>'minimize' run times 
><https://groups.google.com/forum/#!topic/metamath/sODHzKvNsgs>" by
>Norman 
>where he suggested an interesting problem of finding an approximate
>formula 
>for minimization run time based on proof length and label number. It
>would 
>allow (in theory) to schedule minimization jobs more evenly which must
>make 
>a difference when performing full set.mm minimizations.
>
>I decided to give this task a shot, but in the end results turned out
>to be 
>more dull than I had hoped for. I think it's still worth reporting on
>these 
>finds as maybe someone with better data analysis skills will get
>interested 
>in the problem. So, my procedure was as follows.
>
>First of all, I prepared the data as described in the original topic by
>
>Norman. Basically, I took run times from the minimization log by Dylan 
>Houlihan and augmented them by label numbers and proof sizes, getting
>the 
>total of 28438 items (a few had to be removed since, for instance, 
>4exdistrOLD wasn't present in the set.mm version linked in the post).
>Label 
>numbers in mathboxes were shifted up by 1070 (as suggested in the
>original 
>post) to compensate for the fact that 'minimize' doesn't scan mathboxes
>by 
>default.
>
>Perhaps not surprisingly, the majority of theorems (~76%) take less
>than 15 
>seconds to minimize. The mean minimization time is ~45s and standard 
>deviation is ~228, so already the "formula" predicting 0s for each
>theorem 
>gives us a "small" mean error :-) Intuitively it seems that the formula
>
>ought to look like c*label^a*size^b, where a, b and c are some
>constants. I 
>have tried various curve-fitting techniques available in Wolfram 
>Mathematica, for instance, it suggested the following: 
>0.0000811892*label^0.405776*size^1.33858, which has mean error ~35 and 
>standard deviation ~155. A simpler hand-found formula 
>4.33e-9*label*size^1.7 produces comparable mean ~25 and standard
>deviation 
>~180.
>
>These formulae have a small "global" coefficient (~1e-9 in the latter 
>example) which suggests that perhaps one should differentiate between
>the 
>small run time and large run time cases.  I set the small class to be 
>theorems taking less than 15s to minimize and trained a basic linear 
>classifier to distinguish it from the "large time" class. I
>curve-fitted a 
>formula of the form c*label^a*size^b separately in the small and large 
>cases to obtain the following prediction formula:
>
>If -0.0572153 + 2.68348e-6*label + 0.000206234*size < 0.5 then 
>4.92e-6*label^0.68*size^1.03 else 0.000479*label^0.32*size^1.25,
>
>which has mean error ~28 and standard deviation ~156. Clearly it's not
>a 
>large improvement over simpler formulae listed above.
>
>That is more or less what I have for approximating minimization run
>times. 
>Now, the intended application of such an approximation is scheduling 
>minimization jobs evenly, so lets switch our attention to that.
>
>I assumed that we have 16 CPUs (processes, threads, etc) available, and
>we 
>schedule our theorems for minimization by the Longest Processing Time 
>algorithm (the task is assigned to a CPU with smallest total load). 
>Obviously during the scheduling phase we calculate time by one of our 
>formulae, but after theorems have been spread to CPUs, we can evaluate
>our 
>procedure by calculating real run times. The relevant parameters are 
>average CPU load which is always ~79495s, and latency which I define to
>be 
>the difference between maximum CPU time and minimum CPU time (the
>smaller 
>latency is, the closer CPU loads are). It is useful to compare our
>results 
>with a straightforward scheduling approach where one randomly
>partitions 
>theorems into 16 (almost) equal lists. In my experiments such an
>approach 
>yields latency ~39397, so it can be taken as a base value.
>
>So, the first formula (together with LPT scheduling) produces the
>following 
>list of CPU run times:
>
>[76126,83042,81573,90121,72944,79387,74600,84677,78886,69388,86309,83830,69727,74918,81779,84567],
>
>has latency ~20733 and standard deviation ~6055. The second
>(hand-found) 
>formula gives latency ~32342 and standard deviation ~8781, which is an 
>arguably worse result (the latency is not much different from the
>random 
>approach). Finally, the last (large) formula gives latency ~40756 and 
>standard deviation ~9684, which is worse than a random approach.
>
>What is, in my opinion, surprising is that the simpler formula 'size'
>(i.e. 
>minimization time is proportional to the proof size; notice that we
>don't 
>need a multiplicative constant since we are only summing and comparing 
>values in our scheduling algorithm) produces latency ~19218 and
>standard 
>deviation ~6388 which is on par with the first formula found by 
>curve-fitting. I have tried other simple expressions without great
>success, 
>so this seems like the best result I can get.
>
>That is how far I could get with my basic knowledge of statistics. 
>Apparently the end result is that one can simply estimate theorem 
>minimization time by its proof size and then use well-known scheduling 
>algorithms to assign jobs :-) I couldn't find any reports on the 
>minimization run performed by Dylan Houlihan (for instance, I don't
>know 
>the "latency" parameter for that run or even the number of threads
>used), 
>so I don't know whether the approach I'm suggesting has any benefits
>over 
>the method used by Norman and Dylan. Nevertheless, I hope this small
>report 
>was interesting to read and maybe it will motivate other people to try
>to 
>come up with a better formula and/or approach. Here's the dataset I
>used: 
>https://gist.github.com/savask/900161e9d093a5b787b6473113dfde16. I'm
>also 
>looking forward to any questions or criticism regarding approaches used
>
>here. I can't boast a good knowledge of statistics or data science and
>my 
>"analysis" must be taken with a grain of salt, so I will be happy to
>hear 
>opinions of other people.
>
>-- 
>You received this message because you are subscribed to the Google
>Groups "Metamath" group.
>To unsubscribe from this group and stop receiving emails from it, send
>an email to [email protected].
>To view this discussion on the web visit
>https://groups.google.com/d/msgid/metamath/e1867625-9c82-4e9d-9721-962590b6bd2b%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/838F9EFB-5819-4467-8C6E-0E3DCE159754%40panix.com.

Reply via email to