Re: [Haskell-cafe] ANNOUNCE: set-monad

2012-06-22 Thread Tillmann Rendel

Hi George,

thanks for your detailed reply.

George Giorgidze wrote:

The key to the approach used in set-monad is to make progress with the
evaluation of the unconstrained constructors (i.e., Return, Bind, Zero
and Plus) without using constrained set-specific operations. It turns
out that for several cases one can progress with the evaluation
without duplicating f (evaluation relies on monoid laws, Plus is
associative and Zero is left and right identity of Plus). I have
pushed those optimisations to the repo. These optimisations also cover
your example.


Cool.


In your email, you have also asked:


I suspect that I can achieve similar results by using the list monad and
converting to a set in the very end. In what situations can I benefit from
set-monad?


Sometimes set is a more appropriate data structure than list. [...]


Of course. But I was wondering whether something like set-monad could be 
implemented with


  newtype Set a = Set [a]

instead of

  data Set a = Prim ... | Return ... | Bind ... | ...


Both approaches can offer the same interface, but now I think I see two 
reasons why the latter is more efficient than the former: (1) It allows 
to use set-operations when an Ord is known, for example, when the user 
calls union, and (2) It allows optimizations like you describe above.


  Tillmann

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANNOUNCE: set-monad

2012-06-21 Thread George Giorgidze
Hi Tillmann,

Thanks for your interesting question regarding the performance
overheads of the Data.Set.Monad wrapper compared to the original
Data.Set library.

If you use set-specific functions there will not be any difference in
asymptotic complexity between Data.Set.Monad and Data.Set.

In terms of raw performance, there will be a tiny overhead when
unpacking sets from the Prim constructor. But this should be
negligible, as set-specific operations performed using Data.Set under
the hood will likely dominate the runtime.

Now, how does the runtime complexity of overloaded functions from the
Functor, Monad and other supported type classes compared to the
set-specific ones?

You are right that there is a problem when mplus is composed with =.
I have not done a proper investigation and benchmarking, but I think
(and hope) this should be the only problematic case when it comes to
runtime complexity. Luckily, there is an easy workaround (see below).

Let us take a detailed look at this case:

run (Bind (Plus ma mb) f) = run (Plus (Bind ma f) (Bind mb f))

Should ma mb be equal, this will run the function f twice
(unnecessarily). So does this mean that there is only the factor of 2
slowdown compared to the case where the sets are combined before the
bind. Unfortunately, the answer is no, ma and mb could be constructed
with Plus as well. In the worst case the slowdown is k, where k is the
number equal sets in the tree of Pluses (in the worst case all sets in
the tree are equal) in the left hand side of the bind.

The key to the approach used in set-monad is to make progress with the
evaluation of the unconstrained constructors (i.e., Return, Bind, Zero
and Plus) without using constrained set-specific operations. It turns
out that for several cases one can progress with the evaluation
without duplicating f (evaluation relies on monoid laws, Plus is
associative and Zero is left and right identity of Plus). I have
pushed those optimisations to the repo. These optimisations also cover
your example.

But there is one case for which I have not found a way to progress
with the evaluation without duplicating f:

run (Bind (Plus (Bind _ _) (Bind _ _)) f)

For this case, I am still duplicating f. I will have a look whether it
is possible to avoid duplication.

Luckily, the aforementioned limitation can be avoided by using the
mappend function from the Monoid type class instead of mplus from the
MonadPlus type class. Also the desugaring of the do and monad
comprehension notations do not produce calls to mplus.

In your email, you have also asked:

 I suspect that I can achieve similar results by using the list monad and
 converting to a set in the very end. In what situations can I benefit from
 set-monad?

Sometimes set is a more appropriate data structure than list. That is
why the modules like Data.Set and Data.HashSet exist in the first
place. Different operations on sets and lists may have different
runtime complexities. For example, if your problem requires frequent
checking of whether an element exists in a collection then set may be
more appropriate than list. This is just one example.

More generally, if the programmer does not care about order or does
not want duplicates in a collection, again set seems to be more
appropriate than list. If the programmer uses lists in this scenario,
she needs to explicitly eliminate duplicates. In addition to being
inconvenient (if the programmer has to do it frequently), duplicate
elimination for lists is an expensive operation.

The set-monad library itself (by exporting the Monad instance for
sets) could also have an educational value, by allowing the
comprehension notation to be used for sets (and not just for lists).
This recreates the original set comprehension notation from
mathematics in Haskell. Math definitions that use comprehensions can
be directly transcribed to Haskell. It could also be useful to
highlight conceptual differences between lists and sets in Haskell
without different notations getting in the way.

Cheers, George

On 16 June 2012 09:57, Tillmann Rendel ren...@informatik.uni-marburg.de wrote:
 Hi George,


 George Giorgidze wrote:

 I would like to announce the first release of the set-monad library.

 On Hackage: http://hackage.haskell.org/package/set-monad


 Very cool. Seems to work fine. But I am wondering about the impact of using
 your package on asymptotic complexity (and thereby, on performance).


 From your implementation:

 data Set a where
  Prim   :: (Ord a) = S.Set a - Set a
  [...]
  Zero   :: Set a
  Plus   :: Set a - Set a - Set a


 I notice that this part of your datatype looks like a binary tree of sets.
 Lets see how your run function treats it:

 run :: (Ord a) = Set a - S.Set a
 run (Prim s)              = s
 [...]
 run (Zero)                = S.empty
 run (Plus ma mb)          = S.union (run ma) (run mb)


 As expected, the Prim-Zero-Plus structure is treated as a binary tree of
 sets, and run collects all the sets together. That 

Re: [Haskell-cafe] ANNOUNCE: set-monad

2012-06-17 Thread Tillmann Rendel

Hi,

David Menendez wrote:

As you noticed, you can get somewhat better performance by using the
combinators that convert to S.Set internally, because they eliminate
redundant computations later on.


Somewhat better? My example was three times faster, and I guess that 
the fast variant is O(n) and the slow variant is O(n²). So I expect that 
for some applications, the Set interface is more than fast enough and 
the MonadPlus-interface is much too slow.



(Why is unioned faster then numbers? Is union doing some rebalancing? Can I
trigger that effect directly?)


It's because mplus a b= f turns into mplus (a= f) (b= f),
whereas unioned takes the union before calling f.


Here, you compare mplused to unioned, but in the parentheses, I asked 
about numbers and unioned (from my last email).


  Tillmann

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANNOUNCE: set-monad

2012-06-17 Thread David Menendez
On Sun, Jun 17, 2012 at 2:26 AM, Tillmann Rendel
ren...@informatik.uni-marburg.de wrote:
 Hi,


 David Menendez wrote:

 As you noticed, you can get somewhat better performance by using the
 combinators that convert to S.Set internally, because they eliminate
 redundant computations later on.

 Somewhat better? My example was three times faster, and I guess that the
 fast variant is O(n) and the slow variant is O(n²). So I expect that for
 some applications, the Set interface is more than fast enough and the
 MonadPlus-interface is much too slow.

Yes, that should have been significantly better.

 (Why is unioned faster then numbers? Is union doing some rebalancing? Can
 I
 trigger that effect directly?)


 It's because mplus a b= f turns into mplus (a= f) (b= f),
 whereas unioned takes the union before calling f.

 Here, you compare mplused to unioned, but in the parentheses, I asked about
 numbers and unioned (from my last email).

You're right. That may have been caused by the time to compute numbers
itself; I saw that numbers `times` numbers was faster than unioned
`times` unioned the second time I ran it.



Additionally, I haven't done any serious performance testing, but
there also seems to be a speedup when the following lines are added to
run:

run (Bind (Plus (Prim sa) mb) f) = run (Bind (S.union sa (run mb)) f)
run (Bind (Plus ma (Prim sb)) f) = run (Bind (S.union (run ma) sb) f)



-- 
Dave Menendez d...@zednenem.com
http://www.eyrie.org/~zednenem/

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANNOUNCE: set-monad

2012-06-16 Thread Tillmann Rendel

Hi George,

George Giorgidze wrote:

I would like to announce the first release of the set-monad library.

On Hackage: http://hackage.haskell.org/package/set-monad


Very cool. Seems to work fine. But I am wondering about the impact of 
using your package on asymptotic complexity (and thereby, on performance).



From your implementation:

data Set a where
  Prim   :: (Ord a) = S.Set a - Set a
  [...]
  Zero   :: Set a
  Plus   :: Set a - Set a - Set a


I notice that this part of your datatype looks like a binary tree of 
sets. Lets see how your run function treats it:



run :: (Ord a) = Set a - S.Set a
run (Prim s)  = s
[...]
run (Zero)= S.empty
run (Plus ma mb)  = S.union (run ma) (run mb)


As expected, the Prim-Zero-Plus structure is treated as a binary tree of 
sets, and run collects all the sets together. That is probably fine, 
because it should have the same complexity as combining these sets in 
the first place.



run (Bind (Prim s) f) = S.foldl' S.union S.empty (S.map (run . f) s)
[...]
run (Bind Zero _) = S.empty
run (Bind (Plus ma mb) f) = run (Plus (Bind ma f) (Bind mb f))
[...]


But when I use the binary tree of sets on the left-hand side of a bind, 
your run function has to naively traverse the whole tree. So if the same 
elements are included in many sets in the tree of sets, these elements 
will be processed more than once, so the overall complexity is worse 
than necessary.




Here's a ghci session that seems to confirm my suspicion. I first define 
a function using set-monad's convenient monad instance for sets:

$ :m +Control.Monad Data.Set.Monad
$ let s1 `times` s2 = do {e1 - s1; e2 - s2; return (e1, e2)}


Now I produce some test data:

$ let numbers = fromList [1 .. 1000]
$ let unioned = numbers `union` numbers
$ let mplused = numbers `mplus` numbers


Note that these three sets are all equivalent.

$ (numbers == unioned, numbers == mplused, unioned == mplused)
(True, True, True)


However, they behave differently when passed to the times function 
above. numbers and unioned are similarly fast:

$ :set +s
$ size $ numbers `times` numbers
100
(2.56 secs, 1315452984 bytes)

$ size $ unioned `times` unioned
(2.39 secs, 1314950600 bytes)
100


(Why is unioned faster then numbers? Is union doing some rebalancing? 
Can I trigger that effect directly?)


But mplused is much slower:

$ size $ mplused `times` mplused
100
(10.83 secs, 5324731444 bytes)



I suspect that I can achieve similar results by using the list monad and 
converting to a set in the very end. In what situations can I benefit 
from set-monad?


  Tillmann

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANNOUNCE: set-monad

2012-06-16 Thread David Menendez
On Sat, Jun 16, 2012 at 3:57 AM, Tillmann Rendel
ren...@informatik.uni-marburg.de wrote:
 George Giorgidze wrote:

 I would like to announce the first release of the set-monad library.

 On Hackage: http://hackage.haskell.org/package/set-monad


 Very cool. Seems to work fine. But I am wondering about the impact of using
 your package on asymptotic complexity (and thereby, on performance).

For programs using only the Monad/MonadPlus interface, I would expect
it to have the same asymptotic complexity as [] or Cont (S.Set a).

As you noticed, you can get somewhat better performance by using the
combinators that convert to S.Set internally, because they eliminate
redundant computations later on.

 (Why is unioned faster then numbers? Is union doing some rebalancing? Can I
 trigger that effect directly?)

It's because mplus a b = f turns into mplus (a = f) (b = f),
whereas unioned takes the union before calling f.

You can force this by defining:

simplify :: (Ord a) = Set a - Set a
simplify = Prim . run

Unfortunately, there doesn't seem to be any equivalent of Prim in the
exported interface. I guess doing simplify = union empty would work.

-- 
Dave Menendez d...@zednenem.com
http://www.eyrie.org/~zednenem/

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] ANNOUNCE: set-monad

2012-06-15 Thread George Giorgidze
I would like to announce the first release of the set-monad library.

On Hackage: http://hackage.haskell.org/package/set-monad

The set-monad library exports the Set abstract data type and
set-manipulating functions. These functions behave exactly as their
namesakes from the Data.Set module of the containers library. In
addition, the set-monad library extends Data.Set by providing Functor,
Applicative, Alternative, Monad, and MonadPlus instances for sets.

In other words, you can use the set-monad library as a drop-in
replacement for the Data.Set module of the containers library and, in
addition, you will also get the aforementioned instances which are not
available in the containers package.

It is not possible to directly implement instances for the
aforementioned standard Haskell type classes for the Set data type
from the containers library. This is because the key operations map
and union, are constrained with Ord as follows.

map :: (Ord a, Ord b) = (a - b) - Set a - Set b
union :: (Ord a) = Set a - Set a - Set a

The set-monad library provides the type class instances by wrapping
the constrained Set type into a data type that has unconstrained
constructors corresponding to monadic combinators. The data type
constructors that represent monadic combinators are evaluated with a
constrained run function. This elevates the need to use the
constraints in the instance definitions (this is what prevents a
direct definition). The wrapping and unwrapping happens internally in
the library and does not affect its interface.

For details, see the rather compact definitions of the run function
and type class instances. The left identity and associativity monad
laws play a crucial role in the definition of the run function. The
rest of the code should be self explanatory.

The technique is not new. This library was inspired by [1]. To my
knowledge, the original, systematic presentation of the idea to
represent monadic combinators as data is given in [2]. There is also a
Haskell library that provides a generic infrastructure for the
aforementioned wrapping and unwrapping [3].

The set-monad library is particularly useful for writing set-oriented
code using the do and/or monad comprehension notations. For example,
the following definitions now type check.

 s1 :: Set (Int,Int)
 s1 = do a - fromList [1 .. 4]
 b - fromList [1 .. 4]
 return (a,b)

 -- with -XMonadComprehensions
 s2 :: Set (Int,Int)
 s2 = [ (a,b) | (a,b) - s1, even a, even b ]

 s3 :: Set Int
 s3 = fmap (+1) (fromList [1 .. 4])

As noted in [1], the implementation technique can be used for monadic
libraries and EDSLs with restricted types (compiled EDSLs often
restrict the types that they can handle). Haskell's standard monad
type class can be used for restricted monad instances. There is no
need to resort to GHC extensions that rebind the standard monadic
combinators with the library or EDSL specific ones.

[1] CSDL Blog: The home of applied functional programming at KU. Monad
Reification in Haskell and the Sunroof Javascript compiler.
http://www.ittc.ku.edu/csdlblog/?p=88

[2] Chuan-kai Lin. 2006. Programming monads operationally with Unimo.
In Proceedings of the eleventh ACM SIGPLAN International Conference on
Functional Programming (ICFP '06). ACM.

[3] Heinrich Apfelmus. The operational package.
http://hackage.haskell.org/package/operational

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe