Hi George,

George Giorgidze wrote:
I would like to announce the first release of the set-monad library.

On Hackage: http://hackage.haskell.org/package/set-monad

Very cool. Seems to work fine. But I am wondering about the impact of using your package on asymptotic complexity (and thereby, on performance).


From your implementation:
data Set a where
  Prim   :: (Ord a) => S.Set a -> Set a
  [...]
  Zero   :: Set a
  Plus   :: Set a -> Set a -> Set a

I notice that this part of your datatype looks like a binary tree of sets. Lets see how your run function treats it:

run :: (Ord a) => Set a -> S.Set a
run (Prim s)              = s
[...]
run (Zero)                = S.empty
run (Plus ma mb)          = S.union (run ma) (run mb)

As expected, the Prim-Zero-Plus structure is treated as a binary tree of sets, and run collects all the sets together. That is probably fine, because it should have the same complexity as combining these sets in the first place.

run (Bind (Prim s) f)     = S.foldl' S.union S.empty (S.map (run . f) s)
[...]
run (Bind Zero _)         = S.empty
run (Bind (Plus ma mb) f) = run (Plus (Bind ma f) (Bind mb f))
[...]

But when I use the binary tree of sets on the left-hand side of a bind, your run function has to naively traverse the whole tree. So if the same elements are included in many sets in the tree of sets, these elements will be processed more than once, so the overall complexity is worse than necessary.



Here's a ghci session that seems to confirm my suspicion. I first define a function using set-monad's convenient monad instance for sets:
$ :m +Control.Monad Data.Set.Monad
$ let s1 `times` s2 = do {e1 <- s1; e2 <- s2; return (e1, e2)}

Now I produce some test data:
$ let numbers = fromList [1 .. 1000]
$ let unioned = numbers `union` numbers
$ let mplused = numbers `mplus` numbers

Note that these three sets are all equivalent.
$ (numbers == unioned, numbers == mplused, unioned == mplused)
(True, True, True)

However, they behave differently when passed to the times function above. numbers and unioned are similarly "fast":
$ :set +s
$ size $ numbers `times` numbers
1000000
(2.56 secs, 1315452984 bytes)

$ size $ unioned `times` unioned
(2.39 secs, 1314950600 bytes)
1000000

(Why is unioned faster then numbers? Is union doing some rebalancing? Can I trigger that effect directly?)

But mplused is much slower:
$ size $ mplused `times` mplused
1000000
(10.83 secs, 5324731444 bytes)


I suspect that I can achieve similar results by using the list monad and converting to a set in the very end. In what situations can I benefit from set-monad?

  Tillmann

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to