Re: [Haskell-cafe] Re: excercise - a completely lazy sorting algorithm

2009-07-12 Thread Raynor Vliegendhart
On 7/9/09, Heinrich Apfelmus apfel...@quantentunnel.de wrote:

 Of course, some part of algorithm has to be recursive, but this can be
 outsourced to a general recursion scheme, like the hylomorphism

hylo :: Functor f = (a - f a) - (f b - b) - (a - b)
hylo f g = g . fmap (hylo f g) . f


Is that definition of hylo actually usuable? A few on IRC tried to use
that definition for a few examples, but the examples failed to
terminate or blew up the stack.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: excercise - a completely lazy sorting algorithm

2009-07-12 Thread Heinrich Apfelmus
Raynor Vliegendhart wrote:
 On 7/9/09, Heinrich Apfelmus apfel...@quantentunnel.de wrote:
 Of course, some part of algorithm has to be recursive, but this can be
 outsourced to a general recursion scheme, like the hylomorphism

hylo :: Functor f = (a - f a) - (f b - b) - (a - b)
hylo f g = g . fmap (hylo f g) . f

 
 Is that definition of hylo actually usable? A few on IRC tried to use
 that definition for a few examples, but the examples failed to
 terminate or blew up the stack.

The implementation of  quicksort  with  hylo  works fine for me, given
medium sized inputs like for example  quicksort (reverse [1..1000]) .

What were the examples you tried?


Regards,
apfelmus

--
http://apfelmus.nfshost.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: excercise - a completely lazy sorting algorithm

2009-07-11 Thread Heinrich Apfelmus
Matthias Görgens wrote:
 Thanks.  I heard about the hylo-, ana- and catamorphisms before, but
 never explicitly used them.  Time to get started.

You did use them explicitly :) , namely in

  treeSort = bootstrap partitionOnMedian
  bootstrap f = Fix . helper . f
  where helper = fmap (Fix . helper . f)

  partitionOnMedian :: (Ord a) = (S.Seq a) - BTreeRaw a (S.Seq a)

since

  bootstrap f = ana f

In particular, we have this slight reformulation

  bootstrap f = helper'
 where
 helper  = fmap helper'
 helper' = Fix . helper . f

and hence

  bootstrap f = helper'
 where helper' = Fix . fmap helper' . f

and thus

  bootstrap f = Fix . fmap (bootstrap f) . f

which is the definition of  ana f .


 And yet another question: One can get the median in deterministic
 linear time.  For quicksort choosing the median as pivot keeps the O(n
 log n) average running time and brings down the expected worst case,
 too.  Do you know of a (natural) way to combine selecting the median
 and doing the quicksort, so that you don't compare unnecessarily?

 The way to de-randomize quickselect is to calculate medians of
 medians.  I.e. solve the problem for smaller instances first.  I
 suspect if we follow this strategy with the reified quicksort
 call-trees, the de-randomized quicksort will look a lot like
 mergesort.

Interesting question! (Of which I don't know the answer of. ;) )

I never understood the median of median method because I always thought
that it calculates an approximate median of approximate medians, which I
now realize is not the case. I concur that there should be something
better than recursively wasting comparisons in quickselect just for
finding the median pivot in quicksort, especially since both are the
same algorithm modulo lazy evaluation.

Concerning quicksort looking like mergesort, it seems like a good idea
to somehow merge the groups of 5 from the median-of-medians calculation.
However, there is an important difference between quicksort and
mergesort, namely lazy quicksort is able to produce the batch of the
first k smallest elements in

  O(n + k log k) total time [1]

while lazy mergesort needs

  O(n + k log n) total time [2]

There is something about quicksort that makes it better than mergesort.


[1] Proof goes like this: First, quicksort chooses a sublist of smallest
items recursively until a list of the smallest k items remains, this
takes O(n) time. Then the list consisting of the smallest k items is
sorted in Θ(k log k).

[2] Mergesort builds a tournament tree in Θ(n) time and then performs k
tournaments that take O(log n) time each, so the the second phase is O(k
log n).

I need to think about this.



Regards,
apfelmus

--
http://apfelmus.nfshost.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: excercise - a completely lazy sorting algorithm

2009-07-09 Thread Matthias Görgens
Interesting.  Can you make the definition of quicksort non-recursive,
too?  Perhaps with help of a bootstrapping combinator like the one
implicit in the approach I have given earlier?

 treeSort = bootstrap partitionOnMedian
 bootstrap f = Fix . helper . f
 where helper = fmap (Fix . helper . f)

 partitionOnMedian :: forall a. (Ord a) = (S.Seq a) - BTreeRaw a (S.Seq a)

Extra points if you can give 'bootstrap' or an equivalent in terms of
existing Haskell combinators.

Matthias.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: excercise - a completely lazy sorting algorithm

2009-07-09 Thread Heinrich Apfelmus
Matthias Görgens wrote:
 Interesting.  Can you make the definition of quicksort non-recursive,
 too?  Perhaps with help of a bootstrapping combinator like the one
 implicit in the approach I have given earlier?
 
 treeSort = bootstrap partitionOnMedian
 bootstrap f = Fix . helper . f
 where helper = fmap (Fix . helper . f)
 
 partitionOnMedian :: forall a. (Ord a) = (S.Seq a) - BTreeRaw a (S.Seq a)
 
 Extra points if you can give 'bootstrap' or an equivalent in terms of
 existing Haskell combinators.

Sure, no problem.

Of course, some part of algorithm has to be recursive, but this can be
outsourced to a general recursion scheme, like the hylomorphism

hylo :: Functor f = (a - f a) - (f b - b) - (a - b)
hylo f g = g . fmap (hylo f g) . f

This scheme is a combination of the anamorphism which builds the tree of
recursive calls

data Fix f = In { out :: f (Fix f) }

ana :: Functor f = (a - f a) - a - Fix f
ana f = In . fmap (ana f) . f

and a catamorphism which combines the results

cata :: Functor f = (f b - b) - Fix f - b
cata g = g . fmap (cata g) . out

so we could also write

hylo f g = cata g . ana f



For quicksort, the call tree is the fixed point of the functor

type Size= Int
data Q a b   = Empty | Merge Size a b b

instance Functor (Q a) where
fmap f (Merge n x b b') = Merge n x (f b) (f b')
fmap f Empty= Empty

The algorithm

quicksort = hylo partition merge

proceeds in the two well-known phases: first, the input list is
partitioned into two parts separated by a pivot

partition [] = Empty
partition (x:xs) = Merge (length xs + 1) x ls xs
where
ls = filter (= x) xs
rs = filter (  x) xs

and then the sorted results are merged

merge Empty   = []
merge (Merge _ x a b) = a ++ [x] ++ b

The random access tree implementation can be obtained by using a
different  merge  function. In particular, the quicksort from my
previous post was parametrized on  merge  (with a slightly different way
to indicate the list lengths); any function of type

(Size - a - b - b - b) - b - c

is equivalent to a function of type

(Q a b - b) - c


Incidentally, the random access tree *is* the call tree for quicksort,
so we'd have

   merge = In

and we can shorten the whole thing to

   quicksort = ana partition

which is essentially what you coded.



To read about  hylo f g = cata g . ana f  with quicksort as example
again in a slightly different light, see also the following blog post by
Ulisses Costa

 http://ulissesaraujo.wordpress.com/2009/04/09/hylomorphisms-in-haskell/



Regards,
apfelmus

--
http://apfelmus.nfshost.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: excercise - a completely lazy sorting algorithm

2009-07-09 Thread Matthias Görgens
Thanks.  I heard about the hylo-, ana- and catamorphisms before, but
never explicitly used them.  Time to get started.

And yet another question: One can get the median in deterministic
linear time.  For quicksort choosing the median as pivot keeps the O(n
log n) average running time and brings down the expected worst case,
too.  Do you know of a (natural) way to combine selecting the median
and doing the quicksort, so that you don't compare unnecessarily?

The way to de-randomize quickselect is to calculate medians of
medians.  I.e. solve the problem for smaller instances first.  I
suspect if we follow this strategy with the reified quicksort
call-trees, the de-randomized quicksort will look a lot like
mergesort.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: excercise - a completely lazy sorting algorithm

2009-07-08 Thread Heinrich Apfelmus
Matthias Görgens wrote:
 So, a tree like Matthias implements it is the way to go. Basically, it
 reifies the recursive calls of quicksort as a lazy data struture which
 can be evaluated piecemeal.
 
 Yes.  I wonder if it is possible to use a standard (randomized
 quicksort) and employ some type class magic (like continuations) to
 make the reification [1] transparent to the code.
 
 Matthias.
 [1] I reified reify.

Well, you can perform an abstraction similar to what leads to the
definition of  fold .

Starting with say

   sum [] = 0
   sum (x:xs) = x + sum xs

we can replace the special values 0 and + by variables, leading to a
function

   (fold f z) [] = z
   (fold f z) (x:xs) = x `f` (fold f z) xs

In other words, if we specialize the variables to 0 and + again, we get

   fold (+) 0 = sum


Similarly, we can start with quicksort

   quicksort :: Ord a = [a] - [a]
   quicksort [] = []
   quicksort (x:xs) = quicksort ls ++ [x] ++ quicksort rs
   where
   ls = filter (= x) xs
   rs = filter (  x) xs

and replace the operations on the return type with variables

   quicksort :: Ord a = (a - b - b - b) - b - [a] - b
   quicksort f z [] = z
   quicksort f z (x:xs) = f x (quicksort f z ls) (quicksort f z rs)
   where
   ls = filter (= x) xs
   rs = filter (  x) xs

Note however that this is not quite enough to handle the case of a
random access tree like you wrote it, because the latter depends on the
fact that quicksorts *preserves* the length of the list. What I mean is
that we have to keep track of the lengths of the lists, for instance
like this:

   quicksort :: Ord a =
(a - (Int,b) - (Int,b) - b) - b - [a] - (Int,b)
   quicksort f z [] = (0,z)
   quicksort f z (x:xs) =
   (length xs + 1, f x (quicksort f z ls) (quicksort f z rs))
   where
   ls = filter (= x) xs
   rs = filter (  x) xs

And we can implement a random access tree

   type Sized a = (Int, a)
   size = fst

   data Tree a  = Empty
| Branch a (Sized (Tree a)) (Sized (Tree a))

   mySort :: [a] - Sized (Tree a)
   mySort = quicksort Branch Empty

   index :: Sized (Tree a) - Int - Maybe a
   index (0,Empty)_   = Nothing
   index (n,Branch x a b) k
   | 1 = k  k = size a= index a k
   | k == size a + 1  = Just x
   | size a + 1  k  k = n = index b (k - size a - 1)
   | otherwise= Nothing

or an ordinary sort

   qsort :: Ord a = [a] - [a]
   qsort = quicksort (\x a b - snd a ++ [x] ++ snd b) []



Regards,
apfelmus

--
http://apfelmus.nfshost.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: excercise - a completely lazy sorting algorithm

2009-07-07 Thread Heinrich Apfelmus
Petr Pudlak wrote:
 about a month ago, we were discussing sorting in Haskell with a friend. We
 realized a nice property of lazy merge-sort (which is AFAIK the implementation
 of Data.List.sort): Getting the first element of the sorted list actually
 requires O(n) time, not O(n * log n) as in imperative languages.

Similarly, taking the first  k  elements will take  O(n + k log n)
time. And this also works for the standard lazy quicksort. See also

   http://apfelmus.nfshost.com/quicksearch.html

 And immediately we asked: Would it be possible to create a lazy
 selection/sorting algorithm so that getting any element of the sorted
 list/array by its index would require just O(n) time, and getting all the
 elements would still be in O(n * log n)?

 More precisely: The result of sorting an n-element list or array should be a
 structure that allows to ask i-th element of the result (for example, a lazy
 array). Asking k arbitrary elements should take no more than
   O(min(n * log n, k * n))

If you want to take elements only from the beginning, then using a list
as result type is enough. But you want random access, and you rightly
note that this requires some other data structure to store the sorted
result, simply because

   xs !! kis at least  O(k)  time

So, a tree like Matthias implements it is the way to go. Basically, it
reifies the recursive calls of quicksort as a lazy data struture which
can be evaluated piecemeal.

(If you don't care about the  O(k)  inherent in  xs !! k  and only ask
about the number of comparisons it takes to evaluate  xs !! k , then it
is possible to make the standard  quicksort  slightly lazier so that
this works, too. Details in the link given above.)



Regards,
apfelmus

--
http://apfelmus.nfshost.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: excercise - a completely lazy sorting algorithm

2009-07-07 Thread Matthias Görgens
 So, a tree like Matthias implements it is the way to go. Basically, it
 reifies the recursive calls of quicksort as a lazy data struture which
 can be evaluated piecemeal.

Yes.  I wonder if it is possible to use a standard (randomized
quicksort) and employ some type class magic (like continuations) to
make the reification [1] transparent to the code.

Matthias.
[1] I reified reify.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe