#4086: Data.List 'nub' function is O(n^2)
---------------------------------+------------------------------------------
    Reporter:  Pete              |        Owner:                         
        Type:  bug               |       Status:  new                    
    Priority:  normal            |    Milestone:  6.14.1                 
   Component:  libraries/base    |      Version:  6.12.1                 
    Keywords:                    |   Difficulty:  Easy (less than 1 hour)
          Os:  Unknown/Multiple  |     Testcase:                         
Architecture:  Unknown/Multiple  |      Failure:  Documentation bug      
---------------------------------+------------------------------------------
Description changed by igloo:

Old description:

> I recently discovered that some Haskell code was running much slower than
> I would have expected.  I eventually traced the problem to the 'nub'
> function in the Data.List, which ghc implements as follows:
>
> nub l                   = nub' l []
>
>   where
>
>     nub' [] _           = []
>
>     nub' (x:xs) ls
>
>         | x `elem` ls   = nub' xs ls
>
>         | otherwise     = x : nub' xs (x:ls)
>
> This would seem to be O(n**2), because it accumulates the values it sees
> in a list.  If it used a different data structure like a Set, it could be
> made O(n log n).
>
> I'm not sure whether this should be considered a bug or not.  The list
> nub returns is correct.  On the other hand, when I call a library
> function in any programming language, I would normally expect it to use
> an algorithm that provides the best achievable asymptotic performance.
>
> If you decide that you don't consider this to be a bug, can I suggest
> adding a note to the documentation, so people are aware that nub should
> only be used with short lists?

New description:

 I recently discovered that some Haskell code was running much slower than
 I would have expected.  I eventually traced the problem to the 'nub'
 function in the Data.List, which ghc implements as follows:
 {{{
 nub l                   = nub' l []
   where
     nub' [] _           = []
     nub' (x:xs) ls
         | x `elem` ls   = nub' xs ls
         | otherwise     = x : nub' xs (x:ls)
 }}}
 This would seem to be O(n**2), because it accumulates the values it sees
 in a list.  If it used a different data structure like a Set, it could be
 made O(n log n).

 I'm not sure whether this should be considered a bug or not.  The list nub
 returns is correct.  On the other hand, when I call a library function in
 any programming language, I would normally expect it to use an algorithm
 that provides the best achievable asymptotic performance.

 If you decide that you don't consider this to be a bug, can I suggest
 adding a note to the documentation, so people are aware that nub should
 only be used with short lists?

--

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/4086#comment:2>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to