Leon Smith wrote:
Ok, after mulling over the issues that Will Ness has brought up in
the last few days [1], I think I have a partial explanation for the
apparent tension between Will's observations and Heinrich Apfelmus's
Implicit Heaps article [2], which both concern the implementation of
mergeAll [3].
[1] http://permalink.gmane.org/gmane.comp.lang.haskell.cafe/84666
[2] http://apfelmus.nfshost.com/articles/implicit-heaps.html
[3]
http://hackage.haskell.org/packages/archive/data-ordlist/0.4.4/doc/html/Data-List-Ordered.html#v:mergeAll
[...]
This raises the question, is there some
combination of the shape of the merge' tree and some input for which
using VIPs dramatically changes the efficiency of a mergeAll
operation? I suspect the answer is yes, though I don't know for
sure at this point in time.
However, I do tacitly believe that the current tree that mergeAll
uses doesn't exhibit this property for any input, and so I have
simplified the implementations of mergeAll and unionAll in the latest
version of data-ordlist-0.4.4 by avoiding the use of VIPs. This has
the nice side benefit of modestly improving performance when the
elements of the result are highly biased towards the first list.
Will Ness
For those who remember the discussion about this about a year ago, it turns out
there was a simpler version after all lurking somewhere in there (or is it
_out_?).
primes = 2: primes'
where
primes' = 3: 5: [7,9..] `minus` tfold
[ [p*p,p*p+2*p..] | p <- primes' ]
tfold ((x:xs):t) = x : xs `union` tfold (pairs t)
pairs ((x:xs):ys:t) = (x: union xs ys) : pairs t
Unfortunately, it turns out that this program is clear, shorter ... and
subtly wrong. :)
Here an example where the VIP merge would give a different result
bad = tfold $ (1:10:undefined) : (2:3:5:undefined) : (4:undefined) :
error "bad"
We have
ghci> bad
[1,2*** Exception: bad
but the VIP version would give at least
ghci> bad
[1,2,3,4*** Exception: bad / Prelude: undefined
In other words, this new program already tries to compare the number 3
to the fourth list when it is still clear that only the first three
lists are relevant.
Note that this doesn't necessarily mean that the program does not work
for prime numbers, but *proving* correctness is now considerably more
difficult because you need estimates about the growth and distribution
of prime numbers. The VIP version always works as long as there are
infinitely many primes.
Also, since unnecessary comparisons are performed, it is no longer clear
whether the time and space complexity stays the same. (Which is not as
bad as it sounds, since we didn't know them well in the first place
anyway). More worryingly, changing the tree shape now affects correctness.
Regards,
Heinrich Apfelmus
--
http://apfelmus.nfshost.com
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe