Re: [Haskell-cafe] V.I.P.s and the associativity of merge'

2011-01-01 Thread Will Ness
Heinrich Apfelmus apfelmus at quantentunnel.de writes:

 
 Will Ness wrote:
  Heinrich Apfelmus writes:
  Here an example where the VIP merge would give a different result
 
   bad = tfold $ (1:10:undefined) : (2:3:5:undefined) : (4:undefined) :
 error bad
  
  The reason to *not* have the lazy patterns in foldTree for primes, as 
Daniel 
  Fischer discovered back then, is that they give it a space leak. Case in 
point -
   http://ideone.com/DLHp2 : 
   
  [...]
   
 tfold ((x:xs):t) = x : xs `merge` tfold (pairs t)
   where pairs ((x:xs):ys:t) = (x : merge xs ys) : pairs t
   
 hfold  xs = serve . foldTree  mergeP . map vip $ xs
 hfold' xs = serve . foldTree' mergeP . map vip $ xs
   
 foldTree f ~(x:xs) = x `f` foldTree f (pairs xs)
   where pairs ~(x: ~(y:ys)) = f x y : pairs ys
   
 foldTree' f (x:xs) = x `f` foldTree' f (pairs xs)
   where pairs (x: (y:ys)) = f x y : pairs ys
 
  [...] 
   
  so hfold' too appears to be over-eager to-the-right, although still more 
  productive than tfold.
 
 Ah, the lazy patterns in  foldTree  are a different issue, sorry for my 
 bad choice of example.
 
 While I still don't understand the trade-off that turns the lazy 
 patterns into a space leak, there is no harm in allowing  foldTree  to 
 see the complete spine of the list. What we do want to forbid is looking 
 at the elements of that list too early. 

This turns out to be too optimistic a demand on data, in general. 

 In other words, the example 
 should read
 
  bad = tfold $
  (1:10:undefined) : (2:3:5:undefined) : (4:undefined)
  : repeat (error bad : undefined)
 
 i.e. the previously unknown tail is replaced with an infinite list of 
 undefined elements. This example can properly distinguish between the 
 not-so-correct  tfold  and proper VIP implementations (or other 
 implementations that don't do unnecessary comparisons).
 


will have to think this over, but in the mean time, they *both* turn out to be 
*completely* and utterly *wrong* :) in a general case (although probably for 
different reasons).

Here's where:


 *Main mapM_ print $ take 5 $ map (take 10) 
[concatMap (replicate 3) [n,n+1..]|n-[1..]]
 [1,1,1,2,2,2,3,3,3,4]
 [2,2,2,3,3,3,4,4,4,5]
 [3,3,3,4,4,4,5,5,5,6]
 [4,4,4,5,5,5,6,6,6,7]
 [5,5,5,6,6,6,7,7,7,8]

 *Main take 20 $ hfold [concatMap (replicate 3) [n,n+1..]|n-[1..]]
 [1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7]

 *Main take 20 $ tfold [concatMap (replicate 3) [n,n+1..]|n-[1..]]
 [1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7]


when it should'a been 


 [1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,4,4]


Cheers,

:)


 Regards,
 Heinrich Apfelmus
 
 --
 http://apfelmus.nfshost.com
 





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] V.I.P.s and the associativity of merge'

2011-01-01 Thread Will Ness
Will Ness will_n48 at yahoo.com writes:

 
 ... they *both* turn out to be 
 *completely* and utterly *wrong* :) in a general case (although probably for 
 different reasons).


Sorry, my bad. Thought in terms of merge, but the definiton used in VIP code 
was really an union.

When definition was changed to a real merge, non-removing of duplicates, 
everything was as expected in that case, for both versions.


  *Main take 20 $ hfold [concatMap (replicate 3) [n,n+1..]|n-[1..]]
  *Main take 20 $ tfold [concatMap (replicate 3) [n,n+1..]|n-[1..]]

  [1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,4,4]


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] V.I.P.s and the associativity of merge'

2010-12-31 Thread Heinrich Apfelmus

Will Ness wrote:

Heinrich Apfelmus writes:

Here an example where the VIP merge would give a different result

 bad = tfold $ (1:10:undefined) : (2:3:5:undefined) : (4:undefined) :
   error bad
We have
 ghci bad
 [1,2*** Exception: bad
but the VIP version would give at least
 ghci bad
 [1,2,3,4*** Exception: bad / Prelude: undefined


The reason to *not* have the lazy patterns in foldTree for primes, as Daniel 
Fischer discovered back then, is that they give it a space leak. Case in point -
 http://ideone.com/DLHp2 : 
 
[...]
 
   tfold ((x:xs):t) = x : xs `merge` tfold (pairs t)

 where pairs ((x:xs):ys:t) = (x : merge xs ys) : pairs t
 
   hfold  xs = serve . foldTree  mergeP . map vip $ xs

   hfold' xs = serve . foldTree' mergeP . map vip $ xs
 
   foldTree f ~(x:xs) = x `f` foldTree f (pairs xs)

 where pairs ~(x: ~(y:ys)) = f x y : pairs ys
 
   foldTree' f (x:xs) = x `f` foldTree' f (pairs xs)

 where pairs (x: (y:ys)) = f x y : pairs ys

[...] 
 
so hfold' too appears to be over-eager to-the-right, although still more 
productive than tfold.


Leon Smith wrote:

At a glance, it appears to have to do with the irrefutable patterns
that treefold uses. At the moment,  it's not obvious to me how your
example could be made to work without either giving up the ability
to work correctly on finite cases, or giving up the tree and going
back to foldr merge'.  ;-)

[...]

Also,  is there other examples that could distinguish a VIP
implementation that works on finite cases,   and a no-VIP
implementation?   Is it possible to have a VIP example that works on
finite cases and is maximally lazy in this example?


Ah, the lazy patterns in  foldTree  are a different issue, sorry for my 
bad choice of example.


While I still don't understand the trade-off that turns the lazy 
patterns into a space leak, there is no harm in allowing  foldTree  to 
see the complete spine of the list. What we do want to forbid is looking 
at the elements of that list too early. In other words, the example 
should read


bad = tfold $
(1:10:undefined) : (2:3:5:undefined) : (4:undefined)
: repeat (error bad : undefined)

i.e. the previously unknown tail is replaced with an infinite list of 
undefined elements. This example can properly distinguish between the 
not-so-correct  tfold  and proper VIP implementations (or other 
implementations that don't do unnecessary comparisons).



Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] V.I.P.s and the associativity of merge'

2010-12-31 Thread Heinrich Apfelmus

Will Ness wrote:

Heinrich Apfelmus writes:

Here an example where the VIP merge would give a different result

 bad = tfold $ (1:10:undefined) : (2:3:5:undefined) : (4:undefined) :
   error bad
We have
 ghci bad
 [1,2*** Exception: bad
but the VIP version would give at least
 ghci bad
 [1,2,3,4*** Exception: bad / Prelude: undefined


The reason to *not* have the lazy patterns in foldTree for primes, as Daniel 
Fischer discovered back then, is that they give it a space leak. Case in point -
 http://ideone.com/DLHp2 : 
 
[...]
 
   tfold ((x:xs):t) = x : xs `merge` tfold (pairs t)

 where pairs ((x:xs):ys:t) = (x : merge xs ys) : pairs t
 
   hfold  xs = serve . foldTree  mergeP . map vip $ xs

   hfold' xs = serve . foldTree' mergeP . map vip $ xs
 
   foldTree f ~(x:xs) = x `f` foldTree f (pairs xs)

 where pairs ~(x: ~(y:ys)) = f x y : pairs ys
 
   foldTree' f (x:xs) = x `f` foldTree' f (pairs xs)

 where pairs (x: (y:ys)) = f x y : pairs ys

[...] 
 
so hfold' too appears to be over-eager to-the-right, although still more 
productive than tfold.


Leon Smith wrote:

At a glance, it appears to have to do with the irrefutable patterns
that treefold uses. At the moment,  it's not obvious to me how your
example could be made to work without either giving up the ability
to work correctly on finite cases, or giving up the tree and going
back to foldr merge'.  ;-)

[...]

Also,  is there other examples that could distinguish a VIP
implementation that works on finite cases,   and a no-VIP
implementation?   Is it possible to have a VIP example that works on
finite cases and is maximally lazy in this example?


Ah, the lazy patterns in  foldTree  are a different issue, sorry for my 
bad choice of example.


While I still don't understand the trade-off that turns the lazy 
patterns into a space leak, there is no harm in allowing  foldTree  to 
see the complete spine of the list. What we do want to forbid is looking 
at the elements of that list too early. In other words, the example 
should read


bad = tfold $
(1:10:undefined) : (2:3:5:undefined) : (4:undefined)
: repeat (error bad : undefined)

i.e. the previously unknown tail is replaced with an infinite list of 
undefined elements. This example can properly distinguish between the 
not-so-correct  tfold  and proper VIP implementations (or other 
implementations that don't do unnecessary comparisons).



Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] V.I.P.s and the associativity of merge'

2010-12-30 Thread Heinrich Apfelmus

Leon Smith wrote:

Ok,  after mulling over the issues that Will Ness has brought up in
the last few days [1],  I think I have a partial explanation for the
apparent tension between Will's observations and Heinrich Apfelmus's
Implicit Heaps article [2],  which both concern the implementation of
mergeAll [3].

[1] http://permalink.gmane.org/gmane.comp.lang.haskell.cafe/84666
[2] http://apfelmus.nfshost.com/articles/implicit-heaps.html
[3] 

http://hackage.haskell.org/packages/archive/data-ordlist/0.4.4/doc/html/Data-List-Ordered.html#v:mergeAll


[...]

This raises the question,  is there some
combination of the shape of the merge' tree and some input for which
using VIPs dramatically changes the efficiency of a mergeAll
operation?   I suspect the answer is yes,  though I don't know for
sure at this point in time.

However,  I do tacitly believe that the current tree that mergeAll
uses doesn't exhibit this property for any input,   and so I have
simplified the implementations of mergeAll and unionAll in the latest
version of data-ordlist-0.4.4 by avoiding the use of VIPs.   This has
the nice side benefit of modestly improving performance when the
elements of the result are highly biased towards the first list.


Will Ness
For those who remember the discussion about this about a year ago, it turns out 
there was a simpler version after all lurking somewhere in there (or is it 
_out_?).


primes = 2: primes' 
   where

primes' = 3: 5: [7,9..] `minus` tfold
  [ [p*p,p*p+2*p..] | p - primes' ]   
tfold ((x:xs):t)= x : xs `union` tfold (pairs t)

pairs ((x:xs):ys:t) = (x: union xs ys) : pairs t


Unfortunately, it turns out that this program is clear, shorter ... and 
subtly wrong. :)


Here an example where the VIP merge would give a different result

bad = tfold $ (1:10:undefined) : (2:3:5:undefined) : (4:undefined) :
  error bad

We have

ghci bad
[1,2*** Exception: bad

but the VIP version would give at least

ghci bad
[1,2,3,4*** Exception: bad / Prelude: undefined

In other words, this new program already tries to compare the number 3 
to the fourth list when it is still clear that only the first three 
lists are relevant.



Note that this doesn't necessarily mean that the program does not work 
for prime numbers, but *proving* correctness is now considerably more 
difficult because you need estimates about the growth and distribution 
of prime numbers. The VIP version always works as long as there are 
infinitely many primes.


Also, since unnecessary comparisons are performed, it is no longer clear 
whether the time and space complexity stays the same. (Which is not as 
bad as it sounds, since we didn't know them well in the first place 
anyway). More worryingly, changing the tree shape now affects correctness.




Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] V.I.P.s and the associativity of merge'

2010-12-30 Thread Will Ness
Heinrich Apfelmus apfelmus at quantentunnel.de writes:

 
 Leon Smith wrote:
 
  [1] http://permalink.gmane.org/gmane.comp.lang.haskell.cafe/84666
  [2] http://apfelmus.nfshost.com/articles/implicit-heaps.html
  [3] 
 http://hackage.haskell.org/packages/archive/data-ordlist/0.4.4/doc/html/Data-
List-Ordered.html#v:mergeAll
 
 
 Will Ness
  
  primes = 2: primes' 
 where
  primes' = 3: 5: [7,9..] `minus` tfold
[ [p*p,p*p+2*p..] | p - primes' ]   
  tfold ((x:xs):t)= x : xs `union` tfold (pairs t)
  pairs ((x:xs):ys:t) = (x: union xs ys) : pairs t
 
 Unfortunately, it turns out that this program is clear, shorter ... and 
 subtly wrong. :)
 
 Here an example where the VIP merge would give a different result
 
  bad = tfold $ (1:10:undefined) : (2:3:5:undefined) : (4:undefined) :
error bad
 
 We have
 
  ghci bad
  [1,2*** Exception: bad
 
 but the VIP version would give at least
 
  ghci bad
  [1,2,3,4*** Exception: bad / Prelude: undefined
 



The reason to *not* have the lazy patterns in foldTree for primes, as Daniel 
Fischer discovered back then, is that they give it a space leak. Case in point -
 http://ideone.com/DLHp2 : 
 

- 1M primes:  2M primes: --- 3M: --- ideone #:
- no-VIPs:
smart fold: 1.90s- 4.8MB  4.42s- 4.8MB  7.40s- 4.8MB  r3bdL
- VIPs:
smart fold: 1.95s- 4.8MB  4.53s- 4.8MB  7.45s- 4.8MB  4ACpe
simple  fold: 2.04s- 4.8MB  4.76s- 4.8MB  7.86s- 4.8MB  av9XR
lazy  pats: 2.44s-20.1MB  5.70s-21.1MB  9.85s-42.6MB  DLHp2
 
Also, having 
 
   tfold ((x:xs):t) = x : xs `merge` tfold (pairs t)
 where pairs ((x:xs):ys:t) = (x : merge xs ys) : pairs t
 
   hfold  xs = serve . foldTree  mergeP . map vip $ xs
   hfold' xs = serve . foldTree' mergeP . map vip $ xs
 
   foldTree f ~(x:xs) = x `f` foldTree f (pairs xs)
 where pairs ~(x: ~(y:ys)) = f x y : pairs ys
 
   foldTree' f (x:xs) = x `f` foldTree' f (pairs xs)
 where pairs (x: (y:ys)) = f x y : pairs ys
 
and
 
   bad = (1:10:error 1) : (2:3:5:error 2) : (4:error 4) 
   : error bad
   bad2 = (1:10:error 1) : (2:3:5:error 2) : (4:error 4)
   : (5:error 5) : (6:error 6)
   : (7:error 7) 
   : error bad2

we get
 
   *Main hfold bad
   [1,2,3,4*** Exception: bad
   *Main hfold' bad
   [1,2,3,4*** Exception: bad
   *Main tfold bad
   [1,2*** Exception: bad
 
   *Main hfold bad2
   [1,2,3,4*** Exception: 4
   *Main hfold' bad2
   [1,2,3,4*** Exception: bad2
   *Main tfold bad2
   [1,2*** Exception: bad2
 
so hfold' too appears to be over-eager to-the-right, although still more 
productive than tfold.



 Regards,
 Heinrich Apfelmus
 
 --
 http://apfelmus.nfshost.com
 





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] V.I.P.s and the associativity of merge'

2010-12-29 Thread Leon Smith
Ok,  after mulling over the issues that Will Ness has brought up in
the last few days [1],  I think I have a partial explanation for the
apparent tension between Will's observations and Heinrich Apfelmus's
Implicit Heaps article [2],  which both concern the implementation of
mergeAll [3].

The merge' function takes two ordered lists,  with the head of the
first list less than the head of the second,  and merges their
contents:

   merge' [] ys = ys
   merge' (x:xs) ys = x : merge xs ys

The nice thing about this function is we can merge an infinite number
of lists by folding right,  if assume that the heads of those lists
are appropriately ordered.  This appears in Richard Bird's code at the
end of Melissa O'Neill's Genuine Sieve of Eratosthenes,  though
undoubtedly this observation has been independently made by many
people.

Now,  in an ordinary sense,   merge' *is* an associative operator,  in
that given three fully defined ordered lists with ordered heads,
merge' xs (merge' ys zs) == merge' (merge' xs ys) zs.   This allows us
to merge an infinite number of lists using an arbitrary tree of merge'
operations,  without ever worrying that we will return an incorrect
result.   (However,  we might get stuck in a non-productive loop,  for
example if we fold left over an infinite list of ordered lists)

However,  Heinrich's article uses a stronger sense of associativity
which includes laziness properties and thus captures some operational
characteristics.  In this sense,  merge' *is not* associative.To
remedy this,  Heinrich uses VIPs to strengthen the associativity
property of merge'.   This raises the question,  is there some
combination of the shape of the merge' tree and some input for which
using VIPs dramatically changes the efficiency of a mergeAll
operation?   I suspect the answer is yes,  though I don't know for
sure at this point in time.

However,  I do tacitly believe that the current tree that mergeAll
uses doesn't exhibit this property for any input,   and so I have
simplified the implementations of mergeAll and unionAll in the latest
version of data-ordlist-0.4.4 by avoiding the use of VIPs.   This has
the nice side benefit of modestly improving performance when the
elements of the result are highly biased towards the first list.

Best,
Leon

[1] http://permalink.gmane.org/gmane.comp.lang.haskell.cafe/84666
[2] http://apfelmus.nfshost.com/articles/implicit-heaps.html
[3] 
http://hackage.haskell.org/packages/archive/data-ordlist/0.4.4/doc/html/Data-List-Ordered.html#v:mergeAll

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe