>>>>> "Jonathan" == Jonathan King <[EMAIL PROTECTED]> writes:
Jonathan> On Thu, 23 Sep 1999, Manuel M. T. Chakravarty wrote:
>> [EMAIL PROTECTED] (Marcin 'Qrczak' Kowalczyk) wrote,
>>
>> > S.D.Mechveliani <[EMAIL PROTECTED]> pisze:
>> > > So far, no clear progrm example appeared in this list to demonstrate
>> > > Haskell's in-efficiency in comparison to other languages.
>> >
>> > I have not done benchmarking myself yet, but in
>> > <http://www.brookes.ac.uk/~p0071749/papers/bridging.ps.gz>
>> > they describe an algorithm for text formatting.
>> >
>> > | lines | chars | size(KB) | time(s) | memory(KB) |
>> > ----------+-------+-------+----------+---------+------------+
>> > Haskell | 163 | 4640 | 410 | 45.2 | 4505 |
>> > Modula-2 | 551 | 10005 | 74 | 18.6 | 356 |
>> > C++ | 389 | 5832 | 328 | 3.9 | 368 |
>> >
>> > It is not quite fair because in Modula-2 and C++ all data structures
>> > were of fixed size, but...
>>
>> This may lead to a number of different conclusions:
>> * Haskell is hopeless,
>> * the author of the program has no clue about Haskell,
>> * the Haskell compiler is hopeless,
>> * the Haskell interpreter is really only an interpreter, or
>> * the author forgot to pass -O when calling the Haskell
>> compiler.
Jonathan> The original URL cited by Kowalczyk seems to be dead, but you can get
Jonathan> what I believe is an incarnation of the same work in:
Jonathan> http://www.brookes.ac.uk/~p0071749/publications.html#para
Jonathan> Please note that this paper does not actually concentrate on a "Haskell
Jonathan> vs.the world" comparison, but on the derivation of a new, linear time
Jonathan> paragraph formatting algorithm. The date on it was September 9, 1999, so
Jonathan> this is pretty much "hot off the press".
Jonathan> This version of the paper drops the Modula-2 line and the memory size
Jonathan> column; the rest of the new table would look like this:
Jonathan> | lines | chars | size(KB) | time(s) |
Jonathan> ---------------------+-------+-------+----------+----------+
Jonathan> Haskell ghc 4.01 | 183 | 4676 | 453 | 4.72 |
Jonathan> Haskell hbc 0.9999.4 | 183 | 4676 | 196 | 10.34 |
Jonathan> C++ g++ 2.90.27 | 416 | 6310 | 8 | 0.43 |
Jonathan> The C++ code was a direct (by hand) translation of the haskell, except
Jonathan> that they did things the way a C++ programmer would do them: using
Jonathan> destructive array updates, and data structures that were of fixed size
Jonathan> for a given problem. The code size of the C++ version suggests it was
Jonathan> dynamically linked to the usual stuff.
Jonathan> The text they formatted was the ASCII version of _Far from the Madding
Jonathan> Crowd_; clearly a problem of decent size.
Jonathan> You can probably learn a lot more from reading the whole paper, and
Jonathan> presumably from playing with the code. But that's still a factor of 10
Jonathan> speed difference between the ghc version and the C++ version. This is
Jonathan> actually a bit encouraging, I think, but does point out some possible room
Jonathan> for improvement. If I had to make a wild guess, I'd bet the major source
Jonathan> of performance loss is in the haskell data structures used. While the
Jonathan> pedagogical niceness of defining String to be [Char] is obvious, I know
Jonathan> that (at least in Hugs98) this can really ruin your day.
Jonathan> (Some of you might like to try doing:
Jonathan> strace -c your_favorite_standalone_hugs_texthack
Jonathan> and note what you see. Then write the equivalent 5-line perl script (:-))
Jonathan> and do the strace exercise again. Ouch. But since the next version of
Jonathan> Hugs98 is apparently due out imminently, this situation may well have
Jonathan> improved in the new version.)
Jonathan> But, man, things are getting *this close*; if it were really
Jonathan> true that all ghc needs right now to get in line with C++ on
Jonathan> text-formatting problems is a kicking string library and a
Jonathan> way for the compiler to know when to use it, then the
Jonathan> arguments usually used in favor become much more compelling
Jonathan> to a far larger audience.
In his masters thesis,
http://www.ki.informatik.uni-frankfurt.de/data/klose_diplom.ps.gz
Norbert Klose implemented an packing transformation as a Haskell
preprocessor. It transforms the list data structure by adding an
additional constructor for packs of elements and then transforms all
code using lists to use the new data structure and to keep data as
packed as possible. This included using packing prelude functions for
IO etc. He compared the running times of some programs when packing
from one (like list) to 6 elements together. Almost in every case
there is an improvement in running time, e.g. for `wc` the running
time of the 4-pack version takes 36% of the running time of the 1-pack
version. For some tests though, HBCs native lists were still better
than the n-packs. E.g. for his length and filter tests while 6-packs
reduced running time to 32% and 61%, repectively, HBCs native lists
were still faster. On the other hand there were also tests where even
the 1-pack version outperformed the version using HBCs native
lists. E.g. insertion sort, quicksort, sorting with splaytrees, wc,
primes.
Norbert used HBC because at the time (2 years ago) it seemed there
were some technical difficulties when he tried using GHC on the code
generated. IIRC then GHC uses foldr/build and some other optimizations
not present in HBC, but correct me if I am wrong. I would expect some
overlap in the cases that these optimzations and those that the packing
transformation improve, so the potential for improvement over GHCs
native lists might not be as big as that over HBCs.
So it seems that there is quite some room for improvement, but
achieving a factor of 10 on text-formatting problems by a clever
string library and not sacrificing programmer convenience too much,
seems hard to do. But even a factor of 2 would be great!
Marko
--
Marko Sch�tz [EMAIL PROTECTED]
http://www.ki.informatik.uni-frankfurt.de/~marko/