Hi Mark,
> To paraphrase George Orwell, "All reductions are equal, but some are
> more equal than others."
:-)
So are assembly language instructions. Yet, I could think about
some average instruction size for similarly written programs.
Naive as my question might have been, I asked it anyway
in hope to receive some insight how to compare similar algorithms
in Hugs.
Well, I realize that this can be easily done by comparing
execution times of compiled programs. Yet, when I think of
incremental development of algorithms, nothing beats the interpreter.
If only there was a way to make such comparisons properly!
You have quoted from Hugs manual:
> One reasonable use of the statistics produced by +s would be
> to observe general trends in the behaviour of a single algorithm
> with variations in its input.
Sure, this makes sense to me. For example, a number of reductions
during a lookup for k-th list element, as in ([0..]!!k, can be
approximated by
14*k + 14
So here is some trend indeed. And linear too!
Now, when I substitute a standard list by one of the random
access lists of Chris Okasaki, say BinaryList, there is also
a trend:
number of reductions = alpha*log k + initialization,
where
initialization = 14 * r + 45
and r is a size of this list ( minimum list size 10 is needed
for this to be more or less true).
Contrary to what Chris is suggesting, this random access list
does not look so efficient, because of the initialization needed:
this list must be first entirely defined before a lookup can
be attempted, while the standart list might be lazily defined
as infinitive structure. Of ourse, if a function that uses the
random access list does a lot of lookups then the savings will be
apparent. But for a casual usage the standard lists seem to be more
efficient.
The question remains: is the Orwellian law true here? Am I
comparing apples to oranges? Maybe, but what choice do I have
in Hugs? Should I use the timer functionality? But there is
a note in "timer.c" that discourages it too. (The note is appended
here). I wonder whether the strong wording of that note
has a real justification or is it just a formal disclaim?
Yet, when I take two such measures: the reduction count and
the timing - both show the same trend.
A better example would be a comparison of several implementations
of the scalar product of two vectors with Double components,
since it can be considered a primitive for other numerical
algorithms. The data is provided for the vectors of size 100.
Container Uses indexing? Reductions Time (milliseconds)
---------------------------------------------------------------
1. Array Yes 8,064 120
2. List Yes 102,930 830 (GC occured)
3. List No 1,421 80
4. BinaryList Yes 20,641 230
5. BinaryList No 5,755 120
I somehow have a feeling that #3 is the most efficient method here.
But since this is still a guess I will rephrase my original question:
If neither the reduction count nor the timing are appriopriate
measures of efficiency in Hugs, then what is? Is there any
profiling tool available for the interpreter?
Jan
--------- Excerpt from timer.c in Hugs source code distribution ------
"This file provides a simple mechanism for measuring elapsed time on
Unix machines (more precisely, on any machine with an rusage() function).
A somewhat limited version for other systems is also included, believed
to be ANSI compatible, but not guaranteed ...
It is included in the Hugs distribution for the purpose of benchmarking
the Hugs interpreter, comparing its performance across a variety of
different machines, and with other systems for similar languages.
To make use of these functions, use the --enable-timer when configuring
Hugs or change the setting of "WANT_TIMER" in config.h and recompile
Hugs.
It would be somewhat foolish to try to use the timings produced
in this way for anything other than the purpose described above.
In particular, using timings to compare the performance of different
versions of an algorithm is likely to give very misleading results.
The current implementation of Hugs as an interpreter, without any
significant optimizations, means that there are much more significant
overheads than can be accounted for by small variations in Hugs code."