[Metamath] A formula for 'minimize' run time or how to schedule minimization

savask Tue, 31 Dec 2019 03:48:01 -0800

A few days ago I stumbled upon a topic "Formula wanted to estimate 
'minimize' run times 
<https://groups.google.com/forum/#!topic/metamath/sODHzKvNsgs>" by Norman 
where he suggested an interesting problem of finding an approximate formula 
for minimization run time based on proof length and label number. It would 
allow (in theory) to schedule minimization jobs more evenly which must make 
a difference when performing full set.mm minimizations.

I decided to give this task a shot, but in the end results turned out to be
more dull than I had hoped for. I think it's still worth reporting on these
finds as maybe someone with better data analysis skills will get interested
in the problem. So, my procedure was as follows.

First of all, I prepared the data as described in the original topic by
Norman. Basically, I took run times from the minimization log by Dylan
Houlihan and augmented them by label numbers and proof sizes, getting the
total of 28438 items (a few had to be removed since, for instance,
4exdistrOLD wasn't present in the set.mm version linked in the post). Label
numbers in mathboxes were shifted up by 1070 (as suggested in the original
post) to compensate for the fact that 'minimize' doesn't scan mathboxes by
default.

Perhaps not surprisingly, the majority of theorems (~76%) take less than 15
seconds to minimize. The mean minimization time is ~45s and standard
deviation is ~228, so already the "formula" predicting 0s for each theorem
gives us a "small" mean error :-) Intuitively it seems that the formula
ought to look like c*label^a*size^b, where a, b and c are some constants. I
have tried various curve-fitting techniques available in Wolfram
Mathematica, for instance, it suggested the following:
0.0000811892*label^0.405776*size^1.33858, which has mean error ~35 and
standard deviation ~155. A simpler hand-found formula
4.33e-9*label*size^1.7 produces comparable mean ~25 and standard deviation
~180.

These formulae have a small "global" coefficient (~1e-9 in the latter
example) which suggests that perhaps one should differentiate between the
small run time and large run time cases. I set the small class to be
theorems taking less than 15s to minimize and trained a basic linear
classifier to distinguish it from the "large time" class. I curve-fitted a
formula of the form c*label^a*size^b separately in the small and large
cases to obtain the following prediction formula:

If -0.0572153 + 2.68348e-6*label + 0.000206234*size < 0.5 then
4.92e-6*label^0.68*size^1.03 else 0.000479*label^0.32*size^1.25,

which has mean error ~28 and standard deviation ~156. Clearly it's not a
large improvement over simpler formulae listed above.

That is more or less what I have for approximating minimization run times.
Now, the intended application of such an approximation is scheduling
minimization jobs evenly, so lets switch our attention to that.

I assumed that we have 16 CPUs (processes, threads, etc) available, and we
schedule our theorems for minimization by the Longest Processing Time
algorithm (the task is assigned to a CPU with smallest total load).
Obviously during the scheduling phase we calculate time by one of our
formulae, but after theorems have been spread to CPUs, we can evaluate our
procedure by calculating real run times. The relevant parameters are
average CPU load which is always ~79495s, and latency which I define to be
the difference between maximum CPU time and minimum CPU time (the smaller
latency is, the closer CPU loads are). It is useful to compare our results
with a straightforward scheduling approach where one randomly partitions
theorems into 16 (almost) equal lists. In my experiments such an approach
yields latency ~39397, so it can be taken as a base value.

So, the first formula (together with LPT scheduling) produces the following
list of CPU run times:

[76126,83042,81573,90121,72944,79387,74600,84677,78886,69388,86309,83830,69727,74918,81779,84567],

has latency ~20733 and standard deviation ~6055. The second (hand-found)
formula gives latency ~32342 and standard deviation ~8781, which is an
arguably worse result (the latency is not much different from the random
approach). Finally, the last (large) formula gives latency ~40756 and
standard deviation ~9684, which is worse than a random approach.

What is, in my opinion, surprising is that the simpler formula 'size' (i.e.
minimization time is proportional to the proof size; notice that we don't
need a multiplicative constant since we are only summing and comparing
values in our scheduling algorithm) produces latency ~19218 and standard
deviation ~6388 which is on par with the first formula found by
curve-fitting. I have tried other simple expressions without great success,
so this seems like the best result I can get.

That is how far I could get with my basic knowledge of statistics.
Apparently the end result is that one can simply estimate theorem
minimization time by its proof size and then use well-known scheduling
algorithms to assign jobs :-) I couldn't find any reports on the
minimization run performed by Dylan Houlihan (for instance, I don't know
the "latency" parameter for that run or even the number of threads used),
so I don't know whether the approach I'm suggesting has any benefits over
the method used by Norman and Dylan. Nevertheless, I hope this small report
was interesting to read and maybe it will motivate other people to try to
come up with a better formula and/or approach. Here's the dataset I used:
https://gist.github.com/savask/900161e9d093a5b787b6473113dfde16. I'm also
looking forward to any questions or criticism regarding approaches used
here. I can't boast a good knowledge of statistics or data science and my
"analysis" must be taken with a grain of salt, so I will be happy to hear
opinions of other people.

--
You received this message because you are subscribed to the Google Groups
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/metamath/e1867625-9c82-4e9d-9721-962590b6bd2b%40googlegroups.com.

[Metamath] A formula for 'minimize' run time or how to schedule minimization

Reply via email to