Long away and far ago (or something like that;), there was a discussion on Lists implemented as arrays rather than linked structures, during which Jerzy Karczmarczuk commented:
> What bothers me quite strongly is the algorithmic side of operations > upon such objects. > > Typical iterations map- (or zip-) style: do something with the head, pass > recursively to the tail, would demand "intelligent" arrays, with the indexing > header detached from the bulk data itself. The "consumed" part could not be > garbage collected. In a lazy language this might possibly produce a considerable > amount of rubbish which otherwise would be destroyed quite fast. The > concatenation of (parts of) such lists might also have very bad behaviour. > > Can you calm my anxiety? > > Jerzy Karczmarczuk The reason I wanted to reply is that I can offer one data point on this. An even longer time ago, there were the various reduction systems developed in Kiel, implementing the Kiel Reduction Language KiR, a variant of Berkling's reduction languages (the Berkling pointed to in Backus' Turing Award Lecture). KiR lacked lots of useful features Haskell has, and Haskell's implementations still lack lots of useful features KiR's had. I dearly miss those features, but that is not the topic here (I don't know whether any of the systems still install or even run, but see the Manual at http://www.informatik.uni-kiel.de/~base/ for more info). The topic here was list representations. KiR's implementations moved from interpreted graph-reduction over compiled graph-reduction with a code interpreter to compiled graph-reduction with compilation via C, all more or less with the same high-level front end. All of these represented lists as vectors (KiR was dynamically and implicitly typed, btw), and the memory was divided into an area for fixed-size descriptors pointing to each other or into the second area, the heap, for variable-sized data blocks. The descriptor area was reference-counted and simply reused free space (most KiR variants were call-by-value), the other area needed memory compactification when space grew fragmented. A list's elements (or pointers to their descriptors) went into a contiguous block in the heap, and the descriptors made it possible to share subsequences of elements between different lists (descriptors were large enough to hold a pointer to the start of the sequence and its length). Quite as Jerzy suspected. Supported operations included both array and list operations, append required the allocation of a new heap block and copying of *both* lists, but was provided as a primitive (the standard approach for systems that started out as interpreters: a good mix of efficient primitives and interpreted user defined constructs). As others have pointed out, this looks rather inefficient, especially for longer lists, so when we set out to make measurements for a JFP paper [1], comparing with the likes of ghc, we expected to be beaten, but hoped to be not too far away, at least with the latest via-C implementation.. Benchmarks are always difficult, but especially so between so different languages: in KiR, we could easily and correctly execute programs that in Haskell, either wouldn't even compile, or wouldn't terminate, or wouldn't show any result (with similar problems the other way round). And after adapting to the common subset of algorithms, a translation to Haskell might mean that a complex program execution might return immediately, as the compiler and runtime system lazily determined that none of it was needed for the program output (compiled Haskell programs report almost no reduction results, only explicit program output, or show-able results). With all these preliminaries and caveats, and the standard disclaimer that all benchmarks are useless, but interesting, the relevant benchmark is the infamous "pretty quicksort" for some 4000 elements (qusort in the paper - lots of finite lists, traversals, fragments and appends, just like the typical Haskell program written without any concern for efficiency; Haskell programs concerned with efficiency tend to look rather different). To our astonishment, even the code interpreting implementation (which should otherwise be in the ballpark of Hugs) outperformed ghc, hbc, and Clean on this example (call-by-value also played a role: compiled sml was in the same area as compiled KiR, but both only slightly faster than code-interpreted KiR, so data representation and primitives seemed to play the main role). This prompted us to include Augustsson's sanitized variant of quicksort (qusortbin in the paper - from the hbc libs) as well, which gave the results everyone expected (it substantially modifies the algorithm to a profile better supported by the current list representation, e.g., no appends). [and before anyone accuses me of advocating functional quicksort: the naive quicksort is useless, and even the real one isn't the best choice in many cases;-] But the moral for the current discussion: a more intelligent list representation could have substantially more benefits for the average Haskell program than any compiler optimization twiddling, and I'd really like to see someone (PhD student?) investigating that topic seriously, as the answers are unlikely to be obvious. The representation chosen in the reduction systems could be a first hint, but as Jerzy points out, things may be more complicated in the context of Haskell. For comparison, Haskell array performance was somewhere between non-existent and terrible in those days (another clear win for both the compiled and the interpreted reduction systems) and has only recently improved somewhat. That needs to continue and, please, someone do the same for lists! Just my old 2 Pfennige (former currency;-), Claus [1] D. Gaertner, W. Kluge: pi-RED+: An Interactive Compiling Graph Reduction System for an Applied Lambda-Calculus Journal of Functional Programming, 6 (5), 1996. _______________________________________________ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell