[Haskell-cafe] Re: k-minima in Haskell

2007-04-14 Thread apfelmus
[EMAIL PROTECTED] wrote:
 You have n elements, and you need to locate a specific k-element
 permutation.  There are n! / (n-k)! such permutations.  You therefore
 need log(n! / (n-k)!) bits of information.

 A binary comparison provides one bit of information.  So the number of
 comparisons that you need to get that much information is:

   O(log(n! / (n-k)!))
 = O(n log n - (n-k) log (n-k))
 = O(n log (n/(n-k)) + k log (n-k))

 That looks right to me.  If k  n, then this simplifies to
 O(n + k log n), and if k is close to n, it simplifies to
 O(n log n + k).

Ian Lynagh wrote:
 Hmm, is something wrong with the following?:
 [...]
 Total: O(n + k log k)

Mh, I'm not sure. At least, we have

   log (n! / (n-k)!)
 = log n! - log (n-k)!
 =  log 1 + log 2 + log 3 + ... + log (n-k) + ... + log n
  - log 1 - log 2 - log 3 - ... - log (n-k)
 = log (n-k +1) + ... + log (n-k +k)

which is greater than (k log (n-k)) and smaller than (k log n). For k=1,
this estimate yields a minimum time of (log n) to find the minimum of a
list. While not wrong, this clearly underestimates the necessary O(n).

I think that estimating (n log (n/(n-k)) to n for k  n is not correct
because the logarithm of 1 = n/n is 0 and not 1 :)

Ian Lynagh wrote:
 [1] http://en.wikipedia.org/wiki/Selection_algorithm

Thanks for the link, Ian. It clearly shows that a lazy

  take k . qsort

takes only O(n + k log k) time. I posted an analysis as follow up to the
old thread on haskell-general

  http://article.gmane.org/gmane.comp.lang.haskell.general/15110


Regards,
apfelmus

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: k-minima in Haskell

2007-04-13 Thread apfelmus
[EMAIL PROTECTED] wrote:
 Quoting apfelmus [EMAIL PROTECTED]: 
 You mean O(k * log n + n) of course.
 
 Erm, yes.  You can do it in an imperative language by building a heap
 in O(n) time followed by removing k elements, in O(k log n) time.

Ah, sorry, there are indeed to variants not comparable to each other.
Threading a heap of k elements over the entire list needs O(n log k + k)
time and putting all of the list into a heap takes O(k log n + n) time.
For say k = O(sqrt(n)), the former is slower than the latter but it only
needs to keep O(k) list elements in memory.

I think that every k-minima algorithm of the form

   take k . sort

has to keep all list elements in memory: the sort may not throw away
anything because it cannot know how many elements are requested.

Regards,
apfelmus

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: k-minima in Haskell

2007-04-13 Thread Fawzi Mohamed
For various reasons I had a similar problem that I solved iteratively 
simply with a sorted list of the actual best elements.


The only particular things were

1. keep element count (to easily know if the element should be inserted 
in any case)


2. keep the list in reverse order to have the biggest element as first, 
and make the common case (list stays the same) fast


3. make the list strict (avoid space leaks)

not the best in worst case of decreasingly ordered elements O(n*k), but 
good enough for me.


A set + explicit maximal element would probably be the best solution.

Fawzi

-- | keeps the score of the n best (high score)
-- (uses list, optimized for small n)
data NBest a = NBest Int [a] deriving (Eq)

-- | merges an element in the result with given ranking function
merge1 :: Int - (a - Double) - a - NBest a - NBest a
merge1 n rankF fragment (NBest m []) | m==0  n0 = NBest 1 [fragment]
   | m==0 = NBest 0 []
   | otherwise = error empty list and nonzero count
merge1 n rankF fragment (NBest m (xl@(x0:xs)))
   | nm = NBest (m+1) (insertOrdered fragment xl)
   | rankF fragment  (rankF x0) = NBest n (insertOrdered fragment xs)
   | otherwise = NBest m xl
   where
 insertOrdered x (x1:xr) | rankF x = rankF x1 = x:x1:xr
 | otherwise =
 let r = insertOrdered x xr
 in x1 `seq` r `seq` x1:r where
 insertOrdered x [] = [x]

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: k-minima in Haskell

2007-04-12 Thread apfelmus
raghu vardhan [EMAIL PROTECTED]:
 So, any algorithm that sorts is no good (think of n as huge, and k small).

With lazy evaluation, it is!

  http://article.gmane.org/gmane.comp.lang.haskell.general/15010


[EMAIL PROTECTED] wrote:
 The essence of all the answers that you've received is that it doesn't
 matter if k is small, sorting is still the most elegant answer in Haskell.

(It's not only most elegant, it can even be put to work.)

 The reason is that in Haskell, a function which sort function will produce the
 sorted list lazily. If you don't demand the while list, you don't perform
 the whole sort.
 
 Some algorithms are better than others for minimising the amount of
 work if not all of the list is demanded, but ideally, producing the
 first k elements will take O(n log k + k) time.

You mean O(k * log n + n) of course.

Regards,
apfelmus

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: k-minima in Haskell

2007-04-12 Thread Steve Downey

Does the answer change if the data source isn't a list, already in memory,
but a stream? That is, will the sort end up pulling the entire stream into
memory, when we only need k elements from the entire stream.

Interestingly, this is almost exactly the same as one of my standard
interview questions, with the main difference being looking for the k
elements closest to a target value, rather than the smallest. Not that it
really changes the problem, but recognizing that is one of the things I'm
looking for.

On 4/12/07, apfelmus [EMAIL PROTECTED] wrote:


raghu vardhan [EMAIL PROTECTED]:
 So, any algorithm that sorts is no good (think of n as huge, and k
small).

With lazy evaluation, it is!

  http://article.gmane.org/gmane.comp.lang.haskell.general/15010


[EMAIL PROTECTED] wrote:
 The essence of all the answers that you've received is that it doesn't
 matter if k is small, sorting is still the most elegant answer in
Haskell.

(It's not only most elegant, it can even be put to work.)

 The reason is that in Haskell, a function which sort function will
produce the
 sorted list lazily. If you don't demand the while list, you don't
perform
 the whole sort.

 Some algorithms are better than others for minimising the amount of
 work if not all of the list is demanded, but ideally, producing the
 first k elements will take O(n log k + k) time.

You mean O(k * log n + n) of course.

Regards,
apfelmus

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: k-minima in Haskell

2007-04-12 Thread Dan Weston
Ah, but which k elements? You won't know until you've drained your 
entire stream!


Dan

Steve Downey wrote:
Does the answer change if the data source isn't a list, already in 
memory, but a stream? That is, will the sort end up pulling the entire 
stream into memory, when we only need k elements from the entire stream.


Interestingly, this is almost exactly the same as one of my standard 
interview questions, with the main difference being looking for the k 
elements closest to a target value, rather than the smallest. Not that 
it really changes the problem, but recognizing that is one of the things 
I'm looking for.


On 4/12/07, *apfelmus* [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


raghu vardhan [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]:
  So, any algorithm that sorts is no good (think of n as huge, and
k small).

With lazy evaluation, it is!

   http://article.gmane.org/gmane.comp.lang.haskell.general/15010


[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:
  The essence of all the answers that you've received is that it
doesn't
  matter if k is small, sorting is still the most elegant answer in
Haskell.

(It's not only most elegant, it can even be put to work.)

  The reason is that in Haskell, a function which sort function
will produce the
  sorted list lazily. If you don't demand the while list, you don't
perform
  the whole sort.
 
  Some algorithms are better than others for minimising the amount of
  work if not all of the list is demanded, but ideally, producing the
  first k elements will take O(n log k + k) time.

You mean O(k * log n + n) of course.

Regards,
apfelmus

___
Haskell-Cafe mailing list
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe





___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe



___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: k-minima in Haskell

2007-04-12 Thread Mark T.B. Carroll
Dan Weston [EMAIL PROTECTED] writes:

 Ah, but which k elements? You won't know until you've drained your 
 entire stream!

True, but you still don't need to keep the whole stream in memory at
once, just the k-least-so-far as you work your way through the stream -
once you've read a part of the stream you can mostly forget it again.
The question as I understood it was one of if even in Haskell there's a
better way than sorting that means you need only have a fragment of the
stream in memory at once.

-- Mark

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: k-minima in Haskell

2007-04-12 Thread Nicolas Frisby

Both Yitzchak's and my suggestions should run in constant space--some
strictness annotation or switching to foldl' might be necessary.

On 4/12/07, Mark T.B. Carroll [EMAIL PROTECTED] wrote:

Dan Weston [EMAIL PROTECTED] writes:

 Ah, but which k elements? You won't know until you've drained your
 entire stream!

True, but you still don't need to keep the whole stream in memory at
once, just the k-least-so-far as you work your way through the stream -
once you've read a part of the stream you can mostly forget it again.
The question as I understood it was one of if even in Haskell there's a
better way than sorting that means you need only have a fragment of the
stream in memory at once.

-- Mark

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: k-minima in Haskell

2007-04-12 Thread ajb
G'day all.

Quoting apfelmus [EMAIL PROTECTED]:

 You mean O(k * log n + n) of course.

Erm, yes.  You can do it in an imperative language by building a heap
in O(n) time followed by removing k elements, in O(k log n) time.

Cheers,
Andrew Bromage
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe