The bottom line is that

- in logic programming languages, building a list by working on
  a pair of arguments representing a segment of the list is the
  NORMAL way to build a list; it's as fast as it gets, and the
  list is inspectable during construction.

modulo usage patterns: e.g., mostly linear use.

- at least in SML, the (fn ys => xs @ ys)  [Haskell: \ys -> xs ++ ys]
  approach is stunningly slow, so slow as to make even naive use of
  append more attractive for most uses of; it is not the normal way
  to build lists, and for good reason; you can do nothing with a list
  so represented except compose and apply.

Even in your SML code, the "boring old plain lists" seemed to be list_flatten, which uses difference lists in disguise, and won
your little test, right? Using Haskell notation:

flat (LEAF x) ys = x : ys
flat (FORK(a,b)) ys = flat a (flat b ys)
-->
flat (LEAF x)  = \ys->x : ys
flat (FORK(a,b)) = \ys->flat a (flat b ys)
-->
flat (LEAF x)  = (x :)
flat (FORK(a,b)) = flat a . flat b

As in Prolog, it is often better not to make the structure
explicit, though I am slightly disappointed that GHC's
optimizer doesn't give the same performance for the
two versions (when summing a flattened strict tree in Haskell, I get roughly a factor of 2 between naive list append, explicit diff lists, as in the lower version of flat, and hidden diff lists, as in your list_flatten, the latter being fastest).

Of course, one has to be careful about usage patterns and
efficiency considerations. For instance, head and tail with
Haskell diff lists are not O(n), as you mentioned, because
of non-strictness. And building up lots of nested operations
under a lambda means that many of those operations are not likely to be shared, but repeated every time the lambda is applied (such as converting back to plain lists), so one has to be careful about not doing that too often. etc.

The practical consequence of this is that calling both techniques by
the same name is going to mislead people familiar with one kind of
language when they use the other:  logic programmers will wrongly
expect dlists to be practically useful in functional languags,
functional programmers will expect them to be impractical in logic
programming languages.

I do tend to expect a little more insight from Haskellers, but
perhaps you're right. Having a pre-canned library instead of
writing out the near-trivial code patterns oneself, one might be surprised when expected miracles don't happen (diffarrays
were one such case for me;-).

I'd still call both patterns difference lists, because it can be
useful to know about the connections, but if you could suggest
text for the DList documentation that warns of the differences,
and of performance pitfalls, I'm sure the package author would
be happy to add such improvements.

If we ever get per-package wikis for hackage, you could add such comments yourself.

Claus


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to