[Haskell-cafe] Re: k-minima in Haskell
[EMAIL PROTECTED] wrote: You have n elements, and you need to locate a specific k-element permutation. There are n! / (n-k)! such permutations. You therefore need log(n! / (n-k)!) bits of information. A binary comparison provides one bit of information. So the number of comparisons that you need to get that much information is: O(log(n! / (n-k)!)) = O(n log n - (n-k) log (n-k)) = O(n log (n/(n-k)) + k log (n-k)) That looks right to me. If k n, then this simplifies to O(n + k log n), and if k is close to n, it simplifies to O(n log n + k). Ian Lynagh wrote: Hmm, is something wrong with the following?: [...] Total: O(n + k log k) Mh, I'm not sure. At least, we have log (n! / (n-k)!) = log n! - log (n-k)! = log 1 + log 2 + log 3 + ... + log (n-k) + ... + log n - log 1 - log 2 - log 3 - ... - log (n-k) = log (n-k +1) + ... + log (n-k +k) which is greater than (k log (n-k)) and smaller than (k log n). For k=1, this estimate yields a minimum time of (log n) to find the minimum of a list. While not wrong, this clearly underestimates the necessary O(n). I think that estimating (n log (n/(n-k)) to n for k n is not correct because the logarithm of 1 = n/n is 0 and not 1 :) Ian Lynagh wrote: [1] http://en.wikipedia.org/wiki/Selection_algorithm Thanks for the link, Ian. It clearly shows that a lazy take k . qsort takes only O(n + k log k) time. I posted an analysis as follow up to the old thread on haskell-general http://article.gmane.org/gmane.comp.lang.haskell.general/15110 Regards, apfelmus ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: k-minima in Haskell
[EMAIL PROTECTED] wrote: Quoting apfelmus [EMAIL PROTECTED]: You mean O(k * log n + n) of course. Erm, yes. You can do it in an imperative language by building a heap in O(n) time followed by removing k elements, in O(k log n) time. Ah, sorry, there are indeed to variants not comparable to each other. Threading a heap of k elements over the entire list needs O(n log k + k) time and putting all of the list into a heap takes O(k log n + n) time. For say k = O(sqrt(n)), the former is slower than the latter but it only needs to keep O(k) list elements in memory. I think that every k-minima algorithm of the form take k . sort has to keep all list elements in memory: the sort may not throw away anything because it cannot know how many elements are requested. Regards, apfelmus ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: k-minima in Haskell
For various reasons I had a similar problem that I solved iteratively simply with a sorted list of the actual best elements. The only particular things were 1. keep element count (to easily know if the element should be inserted in any case) 2. keep the list in reverse order to have the biggest element as first, and make the common case (list stays the same) fast 3. make the list strict (avoid space leaks) not the best in worst case of decreasingly ordered elements O(n*k), but good enough for me. A set + explicit maximal element would probably be the best solution. Fawzi -- | keeps the score of the n best (high score) -- (uses list, optimized for small n) data NBest a = NBest Int [a] deriving (Eq) -- | merges an element in the result with given ranking function merge1 :: Int - (a - Double) - a - NBest a - NBest a merge1 n rankF fragment (NBest m []) | m==0 n0 = NBest 1 [fragment] | m==0 = NBest 0 [] | otherwise = error empty list and nonzero count merge1 n rankF fragment (NBest m (xl@(x0:xs))) | nm = NBest (m+1) (insertOrdered fragment xl) | rankF fragment (rankF x0) = NBest n (insertOrdered fragment xs) | otherwise = NBest m xl where insertOrdered x (x1:xr) | rankF x = rankF x1 = x:x1:xr | otherwise = let r = insertOrdered x xr in x1 `seq` r `seq` x1:r where insertOrdered x [] = [x] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: k-minima in Haskell
raghu vardhan [EMAIL PROTECTED]: So, any algorithm that sorts is no good (think of n as huge, and k small). With lazy evaluation, it is! http://article.gmane.org/gmane.comp.lang.haskell.general/15010 [EMAIL PROTECTED] wrote: The essence of all the answers that you've received is that it doesn't matter if k is small, sorting is still the most elegant answer in Haskell. (It's not only most elegant, it can even be put to work.) The reason is that in Haskell, a function which sort function will produce the sorted list lazily. If you don't demand the while list, you don't perform the whole sort. Some algorithms are better than others for minimising the amount of work if not all of the list is demanded, but ideally, producing the first k elements will take O(n log k + k) time. You mean O(k * log n + n) of course. Regards, apfelmus ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: k-minima in Haskell
Does the answer change if the data source isn't a list, already in memory, but a stream? That is, will the sort end up pulling the entire stream into memory, when we only need k elements from the entire stream. Interestingly, this is almost exactly the same as one of my standard interview questions, with the main difference being looking for the k elements closest to a target value, rather than the smallest. Not that it really changes the problem, but recognizing that is one of the things I'm looking for. On 4/12/07, apfelmus [EMAIL PROTECTED] wrote: raghu vardhan [EMAIL PROTECTED]: So, any algorithm that sorts is no good (think of n as huge, and k small). With lazy evaluation, it is! http://article.gmane.org/gmane.comp.lang.haskell.general/15010 [EMAIL PROTECTED] wrote: The essence of all the answers that you've received is that it doesn't matter if k is small, sorting is still the most elegant answer in Haskell. (It's not only most elegant, it can even be put to work.) The reason is that in Haskell, a function which sort function will produce the sorted list lazily. If you don't demand the while list, you don't perform the whole sort. Some algorithms are better than others for minimising the amount of work if not all of the list is demanded, but ideally, producing the first k elements will take O(n log k + k) time. You mean O(k * log n + n) of course. Regards, apfelmus ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: k-minima in Haskell
Ah, but which k elements? You won't know until you've drained your entire stream! Dan Steve Downey wrote: Does the answer change if the data source isn't a list, already in memory, but a stream? That is, will the sort end up pulling the entire stream into memory, when we only need k elements from the entire stream. Interestingly, this is almost exactly the same as one of my standard interview questions, with the main difference being looking for the k elements closest to a target value, rather than the smallest. Not that it really changes the problem, but recognizing that is one of the things I'm looking for. On 4/12/07, *apfelmus* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: raghu vardhan [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]: So, any algorithm that sorts is no good (think of n as huge, and k small). With lazy evaluation, it is! http://article.gmane.org/gmane.comp.lang.haskell.general/15010 [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: The essence of all the answers that you've received is that it doesn't matter if k is small, sorting is still the most elegant answer in Haskell. (It's not only most elegant, it can even be put to work.) The reason is that in Haskell, a function which sort function will produce the sorted list lazily. If you don't demand the while list, you don't perform the whole sort. Some algorithms are better than others for minimising the amount of work if not all of the list is demanded, but ideally, producing the first k elements will take O(n log k + k) time. You mean O(k * log n + n) of course. Regards, apfelmus ___ Haskell-Cafe mailing list [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: k-minima in Haskell
Dan Weston [EMAIL PROTECTED] writes: Ah, but which k elements? You won't know until you've drained your entire stream! True, but you still don't need to keep the whole stream in memory at once, just the k-least-so-far as you work your way through the stream - once you've read a part of the stream you can mostly forget it again. The question as I understood it was one of if even in Haskell there's a better way than sorting that means you need only have a fragment of the stream in memory at once. -- Mark ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: k-minima in Haskell
Both Yitzchak's and my suggestions should run in constant space--some strictness annotation or switching to foldl' might be necessary. On 4/12/07, Mark T.B. Carroll [EMAIL PROTECTED] wrote: Dan Weston [EMAIL PROTECTED] writes: Ah, but which k elements? You won't know until you've drained your entire stream! True, but you still don't need to keep the whole stream in memory at once, just the k-least-so-far as you work your way through the stream - once you've read a part of the stream you can mostly forget it again. The question as I understood it was one of if even in Haskell there's a better way than sorting that means you need only have a fragment of the stream in memory at once. -- Mark ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: k-minima in Haskell
G'day all. Quoting apfelmus [EMAIL PROTECTED]: You mean O(k * log n + n) of course. Erm, yes. You can do it in an imperative language by building a heap in O(n) time followed by removing k elements, in O(k log n) time. Cheers, Andrew Bromage ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe