#4086: Data.List 'nub' function is O(n^2)
---------------------------------+------------------------------------------
Reporter: Pete | Owner:
Type: bug | Status: new
Priority: normal | Milestone: 6.14.1
Component: libraries/base | Version: 6.12.1
Keywords: | Difficulty: Easy (less than 1 hour)
Os: Unknown/Multiple | Testcase:
Architecture: Unknown/Multiple | Failure: Documentation bug
---------------------------------+------------------------------------------
Description changed by igloo:
Old description:
> I recently discovered that some Haskell code was running much slower than
> I would have expected. I eventually traced the problem to the 'nub'
> function in the Data.List, which ghc implements as follows:
>
> nub l = nub' l []
>
> where
>
> nub' [] _ = []
>
> nub' (x:xs) ls
>
> | x `elem` ls = nub' xs ls
>
> | otherwise = x : nub' xs (x:ls)
>
> This would seem to be O(n**2), because it accumulates the values it sees
> in a list. If it used a different data structure like a Set, it could be
> made O(n log n).
>
> I'm not sure whether this should be considered a bug or not. The list
> nub returns is correct. On the other hand, when I call a library
> function in any programming language, I would normally expect it to use
> an algorithm that provides the best achievable asymptotic performance.
>
> If you decide that you don't consider this to be a bug, can I suggest
> adding a note to the documentation, so people are aware that nub should
> only be used with short lists?
New description:
I recently discovered that some Haskell code was running much slower than
I would have expected. I eventually traced the problem to the 'nub'
function in the Data.List, which ghc implements as follows:
{{{
nub l = nub' l []
where
nub' [] _ = []
nub' (x:xs) ls
| x `elem` ls = nub' xs ls
| otherwise = x : nub' xs (x:ls)
}}}
This would seem to be O(n**2), because it accumulates the values it sees
in a list. If it used a different data structure like a Set, it could be
made O(n log n).
I'm not sure whether this should be considered a bug or not. The list nub
returns is correct. On the other hand, when I call a library function in
any programming language, I would normally expect it to use an algorithm
that provides the best achievable asymptotic performance.
If you decide that you don't consider this to be a bug, can I suggest
adding a note to the documentation, so people are aware that nub should
only be used with short lists?
--
--
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/4086#comment:2>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs