Leon Smith wrote:
Ok,  after mulling over the issues that Will Ness has brought up in
the last few days [1],  I think I have a partial explanation for the
apparent tension between Will's observations and Heinrich Apfelmus's
Implicit Heaps article [2],  which both concern the implementation of
mergeAll [3].

[1] http://permalink.gmane.org/gmane.comp.lang.haskell.cafe/84666
[2] http://apfelmus.nfshost.com/articles/implicit-heaps.html
[3]
http://hackage.haskell.org/packages/archive/data-ordlist/0.4.4/doc/html/Data-List-Ordered.html#v:mergeAll

[...]

This raises the question,  is there some
combination of the shape of the merge' tree and some input for which
using VIPs dramatically changes the efficiency of a mergeAll
operation?   I suspect the answer is yes,  though I don't know for
sure at this point in time.

However,  I do tacitly believe that the current tree that mergeAll
uses doesn't exhibit this property for any input,   and so I have
simplified the implementations of mergeAll and unionAll in the latest
version of data-ordlist-0.4.4 by avoiding the use of VIPs.   This has
the nice side benefit of modestly improving performance when the
elements of the result are highly biased towards the first list.

Will Ness
For those who remember the discussion about this about a year ago, it turns out there was a simpler version after all lurking somewhere in there (or is it _out_?).

primes = 2: primes' where
    primes' = 3: 5: [7,9..] `minus` tfold
[ [p*p,p*p+2*p..] | p <- primes' ] tfold ((x:xs):t) = x : xs `union` tfold (pairs t)
    pairs ((x:xs):ys:t) = (x: union xs ys) : pairs t

Unfortunately, it turns out that this program is clear, shorter ... and subtly wrong. :)

Here an example where the VIP merge would give a different result

    bad = tfold $ (1:10:undefined) : (2:3:5:undefined) : (4:undefined) :
                  error "bad"

We have

    ghci> bad
    [1,2*** Exception: bad

but the VIP version would give at least

    ghci> bad
    [1,2,3,4*** Exception: bad / Prelude: undefined

In other words, this new program already tries to compare the number 3 to the fourth list when it is still clear that only the first three lists are relevant.


Note that this doesn't necessarily mean that the program does not work for prime numbers, but *proving* correctness is now considerably more difficult because you need estimates about the growth and distribution of prime numbers. The VIP version always works as long as there are infinitely many primes.

Also, since unnecessary comparisons are performed, it is no longer clear whether the time and space complexity stays the same. (Which is not as bad as it sounds, since we didn't know them well in the first place anyway). More worryingly, changing the tree shape now affects correctness.



Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com



_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to