Re: [Haskell-cafe] Re: excercise - a completely lazy sorting algorithm
On 7/9/09, Heinrich Apfelmus apfel...@quantentunnel.de wrote: Of course, some part of algorithm has to be recursive, but this can be outsourced to a general recursion scheme, like the hylomorphism hylo :: Functor f = (a - f a) - (f b - b) - (a - b) hylo f g = g . fmap (hylo f g) . f Is that definition of hylo actually usuable? A few on IRC tried to use that definition for a few examples, but the examples failed to terminate or blew up the stack. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: excercise - a completely lazy sorting algorithm
Raynor Vliegendhart wrote: On 7/9/09, Heinrich Apfelmus apfel...@quantentunnel.de wrote: Of course, some part of algorithm has to be recursive, but this can be outsourced to a general recursion scheme, like the hylomorphism hylo :: Functor f = (a - f a) - (f b - b) - (a - b) hylo f g = g . fmap (hylo f g) . f Is that definition of hylo actually usable? A few on IRC tried to use that definition for a few examples, but the examples failed to terminate or blew up the stack. The implementation of quicksort with hylo works fine for me, given medium sized inputs like for example quicksort (reverse [1..1000]) . What were the examples you tried? Regards, apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: excercise - a completely lazy sorting algorithm
Matthias Görgens wrote: Thanks. I heard about the hylo-, ana- and catamorphisms before, but never explicitly used them. Time to get started. You did use them explicitly :) , namely in treeSort = bootstrap partitionOnMedian bootstrap f = Fix . helper . f where helper = fmap (Fix . helper . f) partitionOnMedian :: (Ord a) = (S.Seq a) - BTreeRaw a (S.Seq a) since bootstrap f = ana f In particular, we have this slight reformulation bootstrap f = helper' where helper = fmap helper' helper' = Fix . helper . f and hence bootstrap f = helper' where helper' = Fix . fmap helper' . f and thus bootstrap f = Fix . fmap (bootstrap f) . f which is the definition of ana f . And yet another question: One can get the median in deterministic linear time. For quicksort choosing the median as pivot keeps the O(n log n) average running time and brings down the expected worst case, too. Do you know of a (natural) way to combine selecting the median and doing the quicksort, so that you don't compare unnecessarily? The way to de-randomize quickselect is to calculate medians of medians. I.e. solve the problem for smaller instances first. I suspect if we follow this strategy with the reified quicksort call-trees, the de-randomized quicksort will look a lot like mergesort. Interesting question! (Of which I don't know the answer of. ;) ) I never understood the median of median method because I always thought that it calculates an approximate median of approximate medians, which I now realize is not the case. I concur that there should be something better than recursively wasting comparisons in quickselect just for finding the median pivot in quicksort, especially since both are the same algorithm modulo lazy evaluation. Concerning quicksort looking like mergesort, it seems like a good idea to somehow merge the groups of 5 from the median-of-medians calculation. However, there is an important difference between quicksort and mergesort, namely lazy quicksort is able to produce the batch of the first k smallest elements in O(n + k log k) total time [1] while lazy mergesort needs O(n + k log n) total time [2] There is something about quicksort that makes it better than mergesort. [1] Proof goes like this: First, quicksort chooses a sublist of smallest items recursively until a list of the smallest k items remains, this takes O(n) time. Then the list consisting of the smallest k items is sorted in Θ(k log k). [2] Mergesort builds a tournament tree in Θ(n) time and then performs k tournaments that take O(log n) time each, so the the second phase is O(k log n). I need to think about this. Regards, apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: excercise - a completely lazy sorting algorithm
Interesting. Can you make the definition of quicksort non-recursive, too? Perhaps with help of a bootstrapping combinator like the one implicit in the approach I have given earlier? treeSort = bootstrap partitionOnMedian bootstrap f = Fix . helper . f where helper = fmap (Fix . helper . f) partitionOnMedian :: forall a. (Ord a) = (S.Seq a) - BTreeRaw a (S.Seq a) Extra points if you can give 'bootstrap' or an equivalent in terms of existing Haskell combinators. Matthias. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: excercise - a completely lazy sorting algorithm
Matthias Görgens wrote: Interesting. Can you make the definition of quicksort non-recursive, too? Perhaps with help of a bootstrapping combinator like the one implicit in the approach I have given earlier? treeSort = bootstrap partitionOnMedian bootstrap f = Fix . helper . f where helper = fmap (Fix . helper . f) partitionOnMedian :: forall a. (Ord a) = (S.Seq a) - BTreeRaw a (S.Seq a) Extra points if you can give 'bootstrap' or an equivalent in terms of existing Haskell combinators. Sure, no problem. Of course, some part of algorithm has to be recursive, but this can be outsourced to a general recursion scheme, like the hylomorphism hylo :: Functor f = (a - f a) - (f b - b) - (a - b) hylo f g = g . fmap (hylo f g) . f This scheme is a combination of the anamorphism which builds the tree of recursive calls data Fix f = In { out :: f (Fix f) } ana :: Functor f = (a - f a) - a - Fix f ana f = In . fmap (ana f) . f and a catamorphism which combines the results cata :: Functor f = (f b - b) - Fix f - b cata g = g . fmap (cata g) . out so we could also write hylo f g = cata g . ana f For quicksort, the call tree is the fixed point of the functor type Size= Int data Q a b = Empty | Merge Size a b b instance Functor (Q a) where fmap f (Merge n x b b') = Merge n x (f b) (f b') fmap f Empty= Empty The algorithm quicksort = hylo partition merge proceeds in the two well-known phases: first, the input list is partitioned into two parts separated by a pivot partition [] = Empty partition (x:xs) = Merge (length xs + 1) x ls xs where ls = filter (= x) xs rs = filter ( x) xs and then the sorted results are merged merge Empty = [] merge (Merge _ x a b) = a ++ [x] ++ b The random access tree implementation can be obtained by using a different merge function. In particular, the quicksort from my previous post was parametrized on merge (with a slightly different way to indicate the list lengths); any function of type (Size - a - b - b - b) - b - c is equivalent to a function of type (Q a b - b) - c Incidentally, the random access tree *is* the call tree for quicksort, so we'd have merge = In and we can shorten the whole thing to quicksort = ana partition which is essentially what you coded. To read about hylo f g = cata g . ana f with quicksort as example again in a slightly different light, see also the following blog post by Ulisses Costa http://ulissesaraujo.wordpress.com/2009/04/09/hylomorphisms-in-haskell/ Regards, apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: excercise - a completely lazy sorting algorithm
Thanks. I heard about the hylo-, ana- and catamorphisms before, but never explicitly used them. Time to get started. And yet another question: One can get the median in deterministic linear time. For quicksort choosing the median as pivot keeps the O(n log n) average running time and brings down the expected worst case, too. Do you know of a (natural) way to combine selecting the median and doing the quicksort, so that you don't compare unnecessarily? The way to de-randomize quickselect is to calculate medians of medians. I.e. solve the problem for smaller instances first. I suspect if we follow this strategy with the reified quicksort call-trees, the de-randomized quicksort will look a lot like mergesort. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: excercise - a completely lazy sorting algorithm
Matthias Görgens wrote: So, a tree like Matthias implements it is the way to go. Basically, it reifies the recursive calls of quicksort as a lazy data struture which can be evaluated piecemeal. Yes. I wonder if it is possible to use a standard (randomized quicksort) and employ some type class magic (like continuations) to make the reification [1] transparent to the code. Matthias. [1] I reified reify. Well, you can perform an abstraction similar to what leads to the definition of fold . Starting with say sum [] = 0 sum (x:xs) = x + sum xs we can replace the special values 0 and + by variables, leading to a function (fold f z) [] = z (fold f z) (x:xs) = x `f` (fold f z) xs In other words, if we specialize the variables to 0 and + again, we get fold (+) 0 = sum Similarly, we can start with quicksort quicksort :: Ord a = [a] - [a] quicksort [] = [] quicksort (x:xs) = quicksort ls ++ [x] ++ quicksort rs where ls = filter (= x) xs rs = filter ( x) xs and replace the operations on the return type with variables quicksort :: Ord a = (a - b - b - b) - b - [a] - b quicksort f z [] = z quicksort f z (x:xs) = f x (quicksort f z ls) (quicksort f z rs) where ls = filter (= x) xs rs = filter ( x) xs Note however that this is not quite enough to handle the case of a random access tree like you wrote it, because the latter depends on the fact that quicksorts *preserves* the length of the list. What I mean is that we have to keep track of the lengths of the lists, for instance like this: quicksort :: Ord a = (a - (Int,b) - (Int,b) - b) - b - [a] - (Int,b) quicksort f z [] = (0,z) quicksort f z (x:xs) = (length xs + 1, f x (quicksort f z ls) (quicksort f z rs)) where ls = filter (= x) xs rs = filter ( x) xs And we can implement a random access tree type Sized a = (Int, a) size = fst data Tree a = Empty | Branch a (Sized (Tree a)) (Sized (Tree a)) mySort :: [a] - Sized (Tree a) mySort = quicksort Branch Empty index :: Sized (Tree a) - Int - Maybe a index (0,Empty)_ = Nothing index (n,Branch x a b) k | 1 = k k = size a= index a k | k == size a + 1 = Just x | size a + 1 k k = n = index b (k - size a - 1) | otherwise= Nothing or an ordinary sort qsort :: Ord a = [a] - [a] qsort = quicksort (\x a b - snd a ++ [x] ++ snd b) [] Regards, apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: excercise - a completely lazy sorting algorithm
Petr Pudlak wrote: about a month ago, we were discussing sorting in Haskell with a friend. We realized a nice property of lazy merge-sort (which is AFAIK the implementation of Data.List.sort): Getting the first element of the sorted list actually requires O(n) time, not O(n * log n) as in imperative languages. Similarly, taking the first k elements will take O(n + k log n) time. And this also works for the standard lazy quicksort. See also http://apfelmus.nfshost.com/quicksearch.html And immediately we asked: Would it be possible to create a lazy selection/sorting algorithm so that getting any element of the sorted list/array by its index would require just O(n) time, and getting all the elements would still be in O(n * log n)? More precisely: The result of sorting an n-element list or array should be a structure that allows to ask i-th element of the result (for example, a lazy array). Asking k arbitrary elements should take no more than O(min(n * log n, k * n)) If you want to take elements only from the beginning, then using a list as result type is enough. But you want random access, and you rightly note that this requires some other data structure to store the sorted result, simply because xs !! kis at least O(k) time So, a tree like Matthias implements it is the way to go. Basically, it reifies the recursive calls of quicksort as a lazy data struture which can be evaluated piecemeal. (If you don't care about the O(k) inherent in xs !! k and only ask about the number of comparisons it takes to evaluate xs !! k , then it is possible to make the standard quicksort slightly lazier so that this works, too. Details in the link given above.) Regards, apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: excercise - a completely lazy sorting algorithm
So, a tree like Matthias implements it is the way to go. Basically, it reifies the recursive calls of quicksort as a lazy data struture which can be evaluated piecemeal. Yes. I wonder if it is possible to use a standard (randomized quicksort) and employ some type class magic (like continuations) to make the reification [1] transparent to the code. Matthias. [1] I reified reify. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe