Re: [Haskell-cafe] I love purity, but it's killing me.
On May 27, 2009, at 1:49 AM, Conal Elliott wrote: Hi Tom, I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for. What do you mean with `exponential behavior'? Exponential related to what? For my FRP EDSL to JavaScript (toy) compiler[1] I've been implementing CSE as well. I traverses the expression tree recursively and creates an small intermediate language containing id's (pointers) to expressions instead of real sub-expressions. Maybe (probably) I am very naive, but I think this trick takes time linear to the amount of sub-expressions in my script. When using a trie instead of a binary tree for the comparisons there should be no more character (or atomic expression) comparisons that the amount of characters in the script. So the problem seems not to be CSE algorithm, but the fact that EDSL itself tends to blow up because it is hosted in Haskell. Like Tom's example: let d = Add c c e = Add d d-- e now as 16 leaf nodes. But again, I might be missing some important point here. What's your latest wisdom about CSE in DSELs? Thanks, - Conal On Thu, Feb 7, 2008 at 11:33 PM, Tom Hawkins tomahawk...@gmail.com wrote: ... -- Sebastiaan Visser (warning: messy code) [1] http://github.com/sebastiaanvisser/frp-js/blob/b4f37d3b564c4932a3019b9b580e6da9449768a8/src/Core/Compiler.hs ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] I love purity, but it's killing me.
Sebastiaan Visser wrote: On May 27, 2009, at 1:49 AM, Conal Elliott wrote: Hi Tom, I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for. What do you mean with `exponential behavior'? Exponential related to what? For my FRP EDSL to JavaScript (toy) compiler[1] I've been implementing CSE as well. I traverses the expression tree recursively and creates an small intermediate language containing id's (pointers) to expressions instead of real sub-expressions. Maybe (probably) I am very naive, but I think this trick takes time linear to the amount of sub-expressions in my script. When using a trie instead of a binary tree for the comparisons there should be no more character (or atomic expression) comparisons that the amount of characters in the script. So the problem seems not to be CSE algorithm, but the fact that EDSL itself tends to blow up because it is hosted in Haskell. Like Tom's example: let d = Add c c e = Add d d-- e now as 16 leaf nodes. But again, I might be missing some important point here. That's exactly right. But it's pretty inconvenient to have your expression tree to blow up exponentially in relation to the code the user actually wrote! You can indeed construct an intermediate language that collapses this blowup, but the pass to create it must take exponential time if written completely purely, since it has to visit everything at least once. In my experience [1], observable sharing using GHC's stable names is a pretty effective solution to this problem. Ganesh [1] http://www.earth.li/~ganesh/research/paradise-icfp08/ === Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html === ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
On May 27, 2009, at 12:51 PM, Sittampalam, Ganesh wrote: Sebastiaan Visser wrote: ... But again, I might be missing some important point here. That's exactly right. But it's pretty inconvenient to have your expression tree to blow up exponentially in relation to the code the user actually wrote! You can indeed construct an intermediate language that collapses this blowup, but the pass to create it must take exponential time if written completely purely, since it has to visit everything at least once. In my experience [1], observable sharing using GHC's stable names is a pretty effective solution to this problem. Ganesh [1] http://www.earth.li/~ganesh/research/paradise-icfp08/ Thanks, I just pushed your paper on top of my stack. -- Sebastiaan Visser ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
Hi Wren, I considered the idea of hashing, but not *perfect* hashing. I don't know how to hash perfectly with something like expressions, which have infinitely many values. Since it's stateful, that means the smart constructors may need to be in an appropriate monad/applicative for passing the memo table around (some hash functions may not need to store the table explicitly). Hm -- stateful? Unless I'm misunderstanding, a stateful monadic/applicative approach would break the simple functional interface I'm going for. Could well be I haven't formed a mental picture that matches yours. - Conal On Tue, May 26, 2009 at 5:23 PM, wren ng thornton w...@freegeek.org wrote: Conal Elliott wrote: Hi Tom, I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for. What's your latest wisdom about CSE in DSELs? Thanks, - Conal One common trick that Tom didn't seem to mention in the 2008-02-07T23:33 post is hash cons'ing. Given a perfect hash function, traverse the term bottom-up storing each (hash,subterm) pair in a memo table and replacing the subterm by its hash. Once that's done, equality checks are trivial, and the memotable can be converted to SSA rather easily. This works best if you amortize the memoization by doing it with smart constructors, so that you don't need to worry about the exponential duplication of work for expressions with DAGy structure sharing in the Haskell. Since it's stateful, that means the smart constructors may need to be in an appropriate monad/applicative for passing the memo table around (some hash functions may not need to store the table explicitly). Maybe this is the too-slow too-complex solution you're using already? -- Live well, ~wren ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
What do you mean with `exponential behavior'? Exponential related to what? I mean that the size of the observable tree can be exponential in the size of the unobservable dag representation. So the problem seems not to be CSE algorithm, but the fact that EDSL itself tends to blow up because it is hosted in Haskell. In other words, the tree size blows up, and hosting in pure Haskell doesn't allow us to examine the compact dag. Are we on the same track now? - Conal On Wed, May 27, 2009 at 3:15 AM, Sebastiaan Visser sfvis...@cs.uu.nlwrote: On May 27, 2009, at 1:49 AM, Conal Elliott wrote: Hi Tom, I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for. What do you mean with `exponential behavior'? Exponential related to what? For my FRP EDSL to JavaScript (toy) compiler[1] I've been implementing CSE as well. I traverses the expression tree recursively and creates an small intermediate language containing id's (pointers) to expressions instead of real sub-expressions. Maybe (probably) I am very naive, but I think this trick takes time linear to the amount of sub-expressions in my script. When using a trie instead of a binary tree for the comparisons there should be no more character (or atomic expression) comparisons that the amount of characters in the script. So the problem seems not to be CSE algorithm, but the fact that EDSL itself tends to blow up because it is hosted in Haskell. Like Tom's example: let d = Add c c e = Add d d-- e now as 16 leaf nodes. But again, I might be missing some important point here. What's your latest wisdom about CSE in DSELs? Thanks, - Conal On Thu, Feb 7, 2008 at 11:33 PM, Tom Hawkins tomahawk...@gmail.com wrote: ... -- Sebastiaan Visser (warning: messy code) [1] http://github.com/sebastiaanvisser/frp-js/blob/b4f37d3b564c4932a3019b9b580e6da9449768a8/src/Core/Compiler.hs ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
In my experience [1], observable sharing using GHC's stable names is a pretty effective solution to this problem. Plus unsafePerformIO and weak references as in *Stretching the storage manager: weak pointers and stable names in Haskellhttp://citeseer.ist.psu.edu/peytonjones99stretching.html *? Lacking a more elegant alternative, that's what I'll probably do again, as in Pan, Vertigo, and Pajama. - Conal On Wed, May 27, 2009 at 3:51 AM, Sittampalam, Ganesh ganesh.sittampa...@credit-suisse.com wrote: Sebastiaan Visser wrote: On May 27, 2009, at 1:49 AM, Conal Elliott wrote: Hi Tom, I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for. What do you mean with `exponential behavior'? Exponential related to what? For my FRP EDSL to JavaScript (toy) compiler[1] I've been implementing CSE as well. I traverses the expression tree recursively and creates an small intermediate language containing id's (pointers) to expressions instead of real sub-expressions. Maybe (probably) I am very naive, but I think this trick takes time linear to the amount of sub-expressions in my script. When using a trie instead of a binary tree for the comparisons there should be no more character (or atomic expression) comparisons that the amount of characters in the script. So the problem seems not to be CSE algorithm, but the fact that EDSL itself tends to blow up because it is hosted in Haskell. Like Tom's example: let d = Add c c e = Add d d-- e now as 16 leaf nodes. But again, I might be missing some important point here. That's exactly right. But it's pretty inconvenient to have your expression tree to blow up exponentially in relation to the code the user actually wrote! You can indeed construct an intermediate language that collapses this blowup, but the pass to create it must take exponential time if written completely purely, since it has to visit everything at least once. In my experience [1], observable sharing using GHC's stable names is a pretty effective solution to this problem. Ganesh [1] http://www.earth.li/~ganesh/research/paradise-icfp08/http://www.earth.li/%7Eganesh/research/paradise-icfp08/ === Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html === ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
I just remembered: Andy Gill has a new paper Type Directed Observable Sharing (http://www.ittc.ku.edu/~andygill/paper.php?label=DSLExtract09) that looks very relevant. Abstract: Haskell is a great language for writing and supporting embedded Domain Specific Languages (DSLs). Some form of observable sharing is often a critical capability for allowing so-called deep DSLs to be compiled and processed. In this paper, we describe and explore uses of an IO function for reification which allows direct observation of sharing. On Tue, May 26, 2009 at 4:49 PM, Conal Elliott co...@conal.net wrote: Hi Tom, I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for. What's your latest wisdom about CSE in DSELs? Thanks, - Conal On Thu, Feb 7, 2008 at 11:33 PM, Tom Hawkins tomahawk...@gmail.comwrote: I've been programming with Haskell for a few years and love it. One of my favorite applications of Haskell is using for domain specific languages. However, after designing a handful of DSLs, I continue to hit what appears to be a fundamental hurdle -- or at least I have yet to find an adequate solution. My DSLs invariably define a datatype to capture expressions; something like this: data Expression = Add Expression Expression | Sub Expression Expression | Variable String | Constant Int deriving Eq Using the datatype Expression, it is easy to mass a collections of functions to help assemble complex expressions, which leads to very concise programs in the DSL. The problem comes when I want to generate efficient code from an Expression (ie. to C or some other target language). The method I use invovles converting the tree of subexpressions into an acyclic graphic to eliminate common subexpressions. The nodes are then topologically ordered and assigned an instruction, or statement for each node. For example: let a = Add (Constant 10) (Variable i1) b = Sub (Variable i2) (Constant 2) c = Add a b would compile to a C program that may look like this: a = 10 + i1; b = i2 - 2; c = a + b; The process of converting an expression tree to a graph uses either Eq or Ord (either derived or a custom instance) to search and build a set of unique nodes to be ordered for execution. In this case a, then b, then c. The problem is expressions often have shared, equivalent subnodes, which dramatically grows the size of the tree. For example: let d = Add c c e = Add d d-- e now as 16 leaf nodes. As these trees grow in size, the equality comparison in graph construction quickly becomes the bottleneck for DSL compilation. What's worse, the phase transition from tractable to intractable is very sharp. In one of my DSL programs, I made a seemingly small change, and compilation time went from milliseconds to not-in-a-million-years. Prior to Haskell, I wrote a few DSLs in OCaml. I didn't have this problem in OCaml because each let expression was mutable, and I could use the physical equality operator to perform fast comparisons. Unfortunately, I have grown to love Haskell's type system and its lack of side effects, and could never go back. Is there anything that can be done to dramatically speed up comparisons, or is there a better approach I can take to extract common subexpressions? I should point out I have an opportunity to get Haskell on a real industrial application. But if I can't solve this problem, I may have to resort to far less eloquent languages. :-( Thanks for any and all help. -Tom ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] I love purity, but it's killing me.
Yes, though we don't bother with weak pointers as we only keep the stable names map around for the duration of CSE so there's no ongoing memory leak issue. From: haskell-cafe-boun...@haskell.org [mailto:haskell-cafe-boun...@haskell.org] On Behalf Of Conal Elliott Sent: 27 May 2009 16:14 To: Sittampalam, Ganesh Cc: Haskell Cafe Subject: Re: [Haskell-cafe] I love purity, but it's killing me. In my experience [1], observable sharing using GHC's stable names is a pretty effective solution to this problem. Plus unsafePerformIO and weak references as in Stretching the storage manager: weak pointers and stable names in Haskell http://citeseer.ist.psu.edu/peytonjones99stretching.html ? Lacking a more elegant alternative, that's what I'll probably do again, as in Pan, Vertigo, and Pajama. - Conal On Wed, May 27, 2009 at 3:51 AM, Sittampalam, Ganesh ganesh.sittampa...@credit-suisse.com wrote: Sebastiaan Visser wrote: On May 27, 2009, at 1:49 AM, Conal Elliott wrote: Hi Tom, I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for. What do you mean with `exponential behavior'? Exponential related to what? For my FRP EDSL to JavaScript (toy) compiler[1] I've been implementing CSE as well. I traverses the expression tree recursively and creates an small intermediate language containing id's (pointers) to expressions instead of real sub-expressions. Maybe (probably) I am very naive, but I think this trick takes time linear to the amount of sub-expressions in my script. When using a trie instead of a binary tree for the comparisons there should be no more character (or atomic expression) comparisons that the amount of characters in the script. So the problem seems not to be CSE algorithm, but the fact that EDSL itself tends to blow up because it is hosted in Haskell. Like Tom's example: let d = Add c c e = Add d d-- e now as 16 leaf nodes. But again, I might be missing some important point here. That's exactly right. But it's pretty inconvenient to have your expression tree to blow up exponentially in relation to the code the user actually wrote! You can indeed construct an intermediate language that collapses this blowup, but the pass to create it must take exponential time if written completely purely, since it has to visit everything at least once. In my experience [1], observable sharing using GHC's stable names is a pretty effective solution to this problem. Ganesh [1] http://www.earth.li/~ganesh/research/paradise-icfp08/ http://www.earth.li/%7Eganesh/research/paradise-icfp08/ === Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html === === Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html === ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
Conal Elliott wrote: Hi Wren, I considered the idea of hashing, but not *perfect* hashing. I don't know how to hash perfectly with something like expressions, which have infinitely many values. An imperfect hash can work. You'll need a memo table with a source of unique symbols (e.g. storing the next unused integer) in order to, effectively, make the hash function perfect[1]. If you have a source of unique symbols then you can also use a trie, Data.Map, or similar in lieu of a hash map. In a language with pointers (or stable names), the pointer is often used as the hash in conjunction with using the memo table as an intern table for smart constructors. Thus, client code can never observe that structurally equal expressions could have different hashes. [1] hash :: HashState m = Expr - m Hash hash e = case lookupHS e of Just h - return h Nothing - do h - nextH insertHS e h return h Since it's stateful, that means the smart constructors may need to be in an appropriate monad/applicative for passing the memo table around (some hash functions may not need to store the table explicitly). Hm -- stateful? Unless I'm misunderstanding, a stateful monadic/applicative approach would break the simple functional interface I'm going for. Could well be I haven't formed a mental picture that matches yours. Er, it's only stateful for the versions above that use pointers or a source of unique symbols (since they need to maintain a memo table). If you can come up with a perfect hash function[2], then there's no need to create/store the memo table at all, since it can be reconstructed on the fly. Since perfect hashing often isn't feasible, the stateful approximations to a perfect hash function are generally used. Sorry if I was unclear. If you don't mind unsafePerformIO (or similar hacks) then you can hide the state from the type system by using the same table for the whole program. Generally for hash cons'ing you want your tables to be as large as they can be (to maximize sharing) so this shouldn't be problematic. However, for languages with scoping it can be beneficial to use separate tables to recognize when expressions need to be recomputed; so the global store might want to be something like a stack of memo tables with fall-through lookup. I believe Applicative is powerful enough to capture the sort of state passing needed since the client code can't ever make decisions based on the state. So with smart constructors (to package up the * etc) I'd think you should be able to have an EDSL that looks nice, just with a more complicated type. Perhaps the issues are with mixing pure Haskell functions into the EDSL? ... The real trick behind hash cons'ing is everywhere substituting the hash in for the sub-expression, effectively flattening all expressions into a single ply. Thus, expression constructors cons the hashes rather than cons'ing expressions. It's similar in spirit to trie'ing, but from the bottom up in the same way that dynamic programming is done. The reason for wanting to do the hashing in smart constructors, as opposed to at the end, is to maximize the benefit of dynamic programming. If all client-visible expressions are represented by hashes, then any structure sharing in the Haskell layer is sharing the hash representation, thus you don't need to traverse the shared substructure multiple times. (If you hand construct equal expressions without sharing, then you'll have to traverse each expression to prove that they're equal, but you can use that proof (the hashes) thenceforth). For host languages with destructive updates (like Smalltalk's become), you can rewrite the subterms as you traverse them, so doing it at the end isn't too bad. If you only expose smart constructors then your Expr type can recurse as whatever Hash type. If you do the hashing at the end, then you'll need to define a catamorphism on Expr. ... This is probably similar to what you're doing in Pan, Vertigo, and Pajama (I haven't read it). The general technique is elegant in its simplicity, and it's not uncommon. Though, like most dynamic programming tricks, it seems not to be as widely known as I would assume, so I thought I'd mention it in case you've missed it. [2] Into any domain of terms that can quickly answer (==), namely flat terms like Integer. Using a bounded type like Int can give better performance guarantees, but there's only so many of them. -- Live well, ~wren ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
Hi Tom, I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for. What's your latest wisdom about CSE in DSELs? Thanks, - Conal On Thu, Feb 7, 2008 at 11:33 PM, Tom Hawkins tomahawk...@gmail.com wrote: I've been programming with Haskell for a few years and love it. One of my favorite applications of Haskell is using for domain specific languages. However, after designing a handful of DSLs, I continue to hit what appears to be a fundamental hurdle -- or at least I have yet to find an adequate solution. My DSLs invariably define a datatype to capture expressions; something like this: data Expression = Add Expression Expression | Sub Expression Expression | Variable String | Constant Int deriving Eq Using the datatype Expression, it is easy to mass a collections of functions to help assemble complex expressions, which leads to very concise programs in the DSL. The problem comes when I want to generate efficient code from an Expression (ie. to C or some other target language). The method I use invovles converting the tree of subexpressions into an acyclic graphic to eliminate common subexpressions. The nodes are then topologically ordered and assigned an instruction, or statement for each node. For example: let a = Add (Constant 10) (Variable i1) b = Sub (Variable i2) (Constant 2) c = Add a b would compile to a C program that may look like this: a = 10 + i1; b = i2 - 2; c = a + b; The process of converting an expression tree to a graph uses either Eq or Ord (either derived or a custom instance) to search and build a set of unique nodes to be ordered for execution. In this case a, then b, then c. The problem is expressions often have shared, equivalent subnodes, which dramatically grows the size of the tree. For example: let d = Add c c e = Add d d-- e now as 16 leaf nodes. As these trees grow in size, the equality comparison in graph construction quickly becomes the bottleneck for DSL compilation. What's worse, the phase transition from tractable to intractable is very sharp. In one of my DSL programs, I made a seemingly small change, and compilation time went from milliseconds to not-in-a-million-years. Prior to Haskell, I wrote a few DSLs in OCaml. I didn't have this problem in OCaml because each let expression was mutable, and I could use the physical equality operator to perform fast comparisons. Unfortunately, I have grown to love Haskell's type system and its lack of side effects, and could never go back. Is there anything that can be done to dramatically speed up comparisons, or is there a better approach I can take to extract common subexpressions? I should point out I have an opportunity to get Haskell on a real industrial application. But if I can't solve this problem, I may have to resort to far less eloquent languages. :-( Thanks for any and all help. -Tom ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
Conal Elliott wrote: Hi Tom, I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for. What's your latest wisdom about CSE in DSELs? Thanks, - Conal One common trick that Tom didn't seem to mention in the 2008-02-07T23:33 post is hash cons'ing. Given a perfect hash function, traverse the term bottom-up storing each (hash,subterm) pair in a memo table and replacing the subterm by its hash. Once that's done, equality checks are trivial, and the memotable can be converted to SSA rather easily. This works best if you amortize the memoization by doing it with smart constructors, so that you don't need to worry about the exponential duplication of work for expressions with DAGy structure sharing in the Haskell. Since it's stateful, that means the smart constructors may need to be in an appropriate monad/applicative for passing the memo table around (some hash functions may not need to store the table explicitly). Maybe this is the too-slow too-complex solution you're using already? -- Live well, ~wren ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
On Tue, May 26, 2009 at 6:49 PM, Conal Elliott co...@conal.net wrote: Hi Tom, I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for. What's your latest wisdom about CSE in DSELs? I wasn't able to find a solution that offered both performance and elegance, so I changed the fundamental operation of the DSL (in this case, atom). When atom was still a hardware description language, the compiler would combine several user defined expressions together resulting in very wide and deep expression trees, resulting in the same problem you are observing. But when I switch the target of atom from HDL to C, the compiler no longer needed to perform the same expression expansion. And since the user defined expressions are generally shallow -- at least in the case of my applications -- atom is able to get away with exhaustive equality comparison (deriving Eq). Sorry I can't be of more help. -Tom ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
Hi Tom, So is the general strategy with observable sharing to use unsafePerformIO with Data.Unique to label expressions at construction? something like that, yes. Basically, you just need: {-# NOINLINE ref #-} ref x = unsafePerformIO (newIORef x) and you can write expressions like ref False == ref False and let x = ref False in x == x However, while referential equality is enough for sharing detection, I *suspect* it's simpler to use the fact that refs are IORefs and you can read and write them (in the IO monad). So a very simple Lava might look like module Lava (Bit,Netlist,low,high,nand2,netlist) where import Data.IORef import System.IO.Unsafe {-# NOINLINE ref #-} ref x = unsafePerformIO (newIORef x) type Ref = IORef (Maybe Int) data Bit = Gate String Ref [Bit] type Netlist = [(String, Int, [Int])] -- gate, output, inputs low = Gate low (ref Nothing) [] high = Gate high (ref Nothing) [] nand2 (a, b) = Gate nand2 (ref Nothing) [a, b] netlist :: Bit - IO Netlist netlist x = do i - newIORef (0 :: Int) ; f i x where f i (Gate str r xs) = do val - readIORef r num - readIORef i case val of Nothing - do writeIORef r (Just num) writeIORef i (num+1) rest - mapM (f i) xs let is = map ((\(g,o,is) - o) . head) rest return ((str,num,is):concat rest) Just j - return [(indirection,j,[])] -- explicit sharing! Indirections can be filtered out at the end, they don't actually give the netlist any information. Of course, now that you have me reading up on Yhc.Core, option #5 is looking considerably more fun. Yeah, I think Yhc.Core is pretty nifty too. Thank Neil! Matt. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
Matthew Naylor wrote: [snip] Finally, when I say observable sharing, I don't necessarily mean it as defined by Koen Claessen and David Sands. I simply mean the use of unsafePerformIO to detect sharing, whether or not this is done by an eq predicate on Refs. (I say this because I think there are simpler ways to detect sharing, though these will probably not have the nice semantic properties of observable sharing.) ghc actually provides a primop for this: reallyUnsafePtrEquality# :: a - a - Int# Use at your own risk. Note that you can only check for equality uing that primop. To detect cycles in data structures efficiently, a total order would be better, but in the presence of copying garbage collection that's asking too much. Bertram ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
On Feb 9, 2008 12:34 PM, Bertram Felgenhauer [EMAIL PROTECTED] wrote: ghc actually provides a primop for this: reallyUnsafePtrEquality# :: a - a - Int# Use at your own risk. Why is it more than unsafe? 'unsafePerformIO' seems to me a lot unsafer than 'reallyUnsafePtrEquality#'. Also, is anybody using 'reallyUnsafePtrEquality#' on a working project? Cheers, -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
Matthew Naylor wrote: (snip) Now, there remains the concern that Haskell's semantics does not enforce sharing. A Haskell compiler is free to change the sharing a program at a whim, unknowingly to the programmer who may be relying on it in for an efficient program. However, to my knowledge, it is an unwritten rule of Haskell compilers that sharing *is* preserved, and that they do perform *graph* reduction. That is not true anymore for the threaded runtime of ghc. If two threads demand the same thunk, one of them will usually block, but there is a small window where both threads can start evaluting the expression. To prevent this, you'd have to take a lock or otherwise synchronize the threads upon entering each thunk, which is prohibitively expensive. See Haskell on a Shared-Memory Multiprocessor, http://www.haskell.org/~simonmar/papers/multiproc.pdf for details, section 3.1 in particular. Bertram ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
Hi Tom, In addition to the sharing problem, another shortcoming of Haskell DSLs is they can not fully exploit the benefits of algebraic datatypes. Specifically, pattern matching ADTs can only be used to control the compile-time configuration of the target, it can't be used to describe the target's behavior -- at least for DSLs that generate code that executes outside of Haskell's runtime. you can embed algebraic data types and pattern matching in Haskell with your own semantics, and retain type inference. It goes something like this: (nil, (|)) = datatype (cons0 [] \/ cons2 (:)) map f xs = match xs rules where rules (x, xs) = [ nil -- nil , x | xs -- f x | map f xs ] here, map :: (Term a - Term b) - Term [a] - Term [b]. The main issue is that you have to quantify the free variables in patterns, hence the rules function. I don't know if this helps you. Writing a real compiler would solve both of these problems. Is there any Haskell implementation that has a clean cut-point, from which I can start from a fully type-checked, type-annotated intermediate representation? The Yhc.Core library is very simple to use and fairly mature (Neil's been tweeking it for about 3 years now), but it doesn't meet your type-annotated requirement. (Untyped core is still pretty useful, though.) If you go the real compiler route, would it not make sense to take the DSL as the source language rather than Haskell? Or are the DSL and Haskell quite similar? Or perhaps you are thinking of a two language system, where some code is evaluated at compile time by Haskell, and some is compiled to the target language? If so, maybe you just want domain specific syntax inside a Haskell program, in which case the paper Why it's nice to be quoted: quasiquoting for haskell might be relevant (it's actually supported in GHC I think). Anyway, all very thought provoking! Matt. P.S. Tom Hawkins wrote: Emil Axelsson wrote: I know of a few of ways to express sharing in a pure language: 1) Observable sharing, which, in general, is unsafe. 2) Using Template Haskell 3) Matthew Naylor has done some work on expressible sharing, which has 4) Use a monad (but I'm sure this is what you're trying to avoid). Or... 5) Forget embedding the DSL, and write a direct compiler. Taking options 2 or 5 just to solve the sharing problem sounds to me like a lot of hard work for little reward. But don't worry, I won't repeat my observable sharing speech. :-) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
On Fri, 8 Feb 2008, Tom Hawkins wrote: 5) Forget embedding the DSL, and write a direct compiler. In addition to the sharing problem, another shortcoming of Haskell DSLs is they can not fully exploit the benefits of algebraic datatypes. Specifically, pattern matching ADTs can only be used to control the compile-time configuration of the target, it can't be used to describe the target's behavior -- at least for DSLs that generate code that executes outside of Haskell's runtime. Also in a pure Haskell library you will try to avoid direct access to constructors, because the internal data structures might change. Better are functions that access the internal data of a type, like 'maybe' and 'either' for 'Maybe' and 'Either', respectively. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
On Feb 9, 2008 12:28 AM, Tom Hawkins [EMAIL PROTECTED] wrote: 5) Forget embedding the DSL, and write a direct compiler. In addition to the sharing problem, another shortcoming of Haskell DSLs is they can not fully exploit the benefits of algebraic datatypes. Specifically, pattern matching ADTs can only be used to control the compile-time configuration of the target, it can't be used to describe the target's behavior -- at least for DSLs that generate code that executes outside of Haskell's runtime. Only partly true. Probably you are not aware of them (I myself learned about its existence a few days ago) but pattern quasiquoting (available in GHC's HEAD) can be used for that. http://www.haskell.org/ghc/dist/current/docs/users_guide/template-haskell.html#th-quasiquotation Writing a real compiler would solve both of these problems. Is there any Haskell implementation that has a clean cut-point, from which I can start from a fully type-checked, type-annotated intermediate representation? If you have to write a compiler why not define a language which fits better with the semantics of the embedded language instead of using plain Haskell? The approach you propose has the disadvantages of both the embedded and the standalone languages. On one hand you have to stick with the syntax of the host language which may not fit with your exact semantical requirements and, on the other hand, you cannot take advantage of all the existing machinery around the host language (you have to code your own compiler). Furthermore, the first citizen status of functions make it impossible (or really difficult at least) to compile EDSL descriptions avoiding runtime and simply applying a static analysis approach (using Core or plain Haskell as input). And thanks for the link to John's paper describing Hydra's use of Template Haskell. I will definiately consider TH. Well, TH would be one of those static analysis approaches. Actually, O'Donell's implementation (which uses an outdated version of Template Haskell) only works with a small Haskell subset. So using TH you'd probably be changing the host language anyhow. Furthermore, the TH approach consists in adding node labels by preprocessing the EDSL description, making sharing observable. That makes the original EDSL description inpure. The only difference is that side effects are added by preprocessing instead of using runtime unsafe functions. Some pointers covering the topic: [1] and [2] summarize what are the alternatives to observe sharing in Haskell whereas [3] compares the embedded approach vs standalone approach and advocates the last one. 1) http://www.imit.kth.se/~ingo/MasterThesis/ThesisAlfonsoAcosta2007.pdf (section 2.4.1 and 3.1) 2) http://www.cs.um.edu.mt/svrg/Papers/csaw2006-01.pdf (section 3) 3) http://web.cecs.pdx.edu/~sheard/papers/secondLook.ps ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
On Fri, 8 Feb 2008, Matthew Naylor wrote: Now recall that referential transparency lets you replace equals with equals without changing the *value produced* by a program. Note that it says nothing about preserving *runtime behaviour*. Sharing, for example, may be lost. So if you do equational reasoning on function f (above), and loose some sharing, then you can only expect that the same sharing will also be also lost in the generated program. As long as the generated program computes the same result as it did before, referential transparency will be, overall, preserved; it would only be lost intermediately. This is what I mean by safe. I think there are degrees of observability. If a Haskell library immediately talks to a C library and shares resources generated by the library, then this sharing can be hardly observed and the method is somehow safe. If you generate a C program with Haskell and write it to a disk it can be easily observed and people might rely on a particular resulting C program. If the C program is piped to a C compiler which is immediately run, then sharing can be hardly observed. Even within Haskell sharing is somehow observable, the Haskell program could observe the free memory of the machine and thus it can see a difference between sharing and duplicated objects. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
Hi Matt, On Feb 9, 2008 1:07 PM, Matthew Naylor [EMAIL PROTECTED] wrote: If you go the real compiler route, would it not make sense to take the DSL as the source language rather than Haskell? Or are the DSL and Haskell quite similar? The two are nearly identical. In fact the only significant difference between the languages is the semantics of top level monad; it wouldn't be IO, but something else. With the syntax the same, it could leverage much of Haskell's standard library. Or perhaps you are thinking of a two language system, where some code is evaluated at compile time by Haskell, and some is compiled to the target language? Not necessarily in the same compilation flow, but I can think of several scenarios where it would be advantageous for code written in this other language to be pulled into a conventional Haskell program. Taking options 2 or 5 just to solve the sharing problem sounds to me like a lot of hard work for little reward. But don't worry, I won't repeat my observable sharing speech. :-) So is the general strategy with observable sharing to use unsafePerformIO with Data.Unique to label expressions at construction? Ahh...clever! I did not think of this. Of course, now that you have me reading up on Yhc.Core, option #5 is looking considerably more fun. -Tom ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
Hello Tom, Friday, February 8, 2008, 9:33:35 AM, you wrote: The process of converting an expression tree to a graph uses either Eq or Ord (either derived or a custom instance) to search and build a set of unique nodes to be ordered for execution. in similar situation, i've added hash field to each node, initialized by smart constructor: data Expression = Add Hash Expression Expression | ... type Hash=Int add x y = Add (x*y+1234567) x y ... -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
On Fri, 8 Feb 2008, Tom Hawkins wrote: I've been programming with Haskell for a few years and love it. One of my favorite applications of Haskell is using for domain specific languages. However, after designing a handful of DSLs, I continue to hit what appears to be a fundamental hurdle -- or at least I have yet to find an adequate solution. It seems to become a FAQ. I think all DSLs suffer from the same problems: sharing and recursion. I've used wrappers for CSound, SuperCollider, MetaPost, they all have these problems. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
Hi, (Warning: longish message!) There is some concern, and rightly so, that observable sharing is dangerous, and that your Haskell program will explode if you use it, and perhaps even that anyone who uses it is dirty and should be sent to matron's for a good scrubbing! However, when used safely, it is not a hack, nor even dirty, but a nice, simple solution to an otherwise nasty problem. Below I define what I mean by safely. First consider the circumstances under which obserevable sharing is useful. Typically we have some Haskell function f that produces a symbolic representation of an expression. For whatever reason, we'd like to *generate a program* that computes the value of this expression, rather that computing it in Haskell. For example, in Lava, we want to generate a VHDL program so that the expression can be computed on, say, an FPGA. In Tom's case, he wants to generate a C program to compute the expression. All perfectly reasonable, and in my opinion, a very powerfull way to use Haskell. Now recall that referential transparency lets you replace equals with equals without changing the *value produced* by a program. Note that it says nothing about preserving *runtime behaviour*. Sharing, for example, may be lost. So if you do equational reasoning on function f (above), and loose some sharing, then you can only expect that the same sharing will also be also lost in the generated program. As long as the generated program computes the same result as it did before, referential transparency will be, overall, preserved; it would only be lost intermediately. This is what I mean by safe. Now, there remains the concern that Haskell's semantics does not enforce sharing. A Haskell compiler is free to change the sharing a program at a whim, unknowingly to the programmer who may be relying on it in for an efficient program. However, to my knowledge, it is an unwritten rule of Haskell compilers that sharing *is* preserved, and that they do perform *graph* reduction. Clean, a similar language to Haskell, indeed has a semantics based on graphs. So I don't believe Haskell being non-strict (not necessarily lazy) is a good reason for not using observable sharing. Though I do feel better when I compile without -O. :-) Finally, when I say observable sharing, I don't necessarily mean it as defined by Koen Claessen and David Sands. I simply mean the use of unsafePerformIO to detect sharing, whether or not this is done by an eq predicate on Refs. (I say this because I think there are simpler ways to detect sharing, though these will probably not have the nice semantic properties of observable sharing.) Sorry for the somewhat long exposition. :-) Matt. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
On 2/8/08, Emil Axelsson [EMAIL PROTECTED] wrote: I know of a few of ways to express sharing in a pure language: 1) Observable sharing, which, in general, is unsafe. 2) Using Template Haskell 3) Matthew Naylor has done some work on expressible sharing, which has 4) Use a monad (but I'm sure this is what you're trying to avoid). Or... 5) Forget embedding the DSL, and write a direct compiler. In addition to the sharing problem, another shortcoming of Haskell DSLs is they can not fully exploit the benefits of algebraic datatypes. Specifically, pattern matching ADTs can only be used to control the compile-time configuration of the target, it can't be used to describe the target's behavior -- at least for DSLs that generate code that executes outside of Haskell's runtime. Writing a real compiler would solve both of these problems. Is there any Haskell implementation that has a clean cut-point, from which I can start from a fully type-checked, type-annotated intermediate representation? And thanks for the link to John's paper describing Hydra's use of Template Haskell. I will definiately consider TH. -Tom ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
On 2/8/08, Matthew Naylor [EMAIL PROTECTED] wrote: it in for an efficient program. However, to my knowledge, it is an unwritten rule of Haskell compilers that sharing *is* preserved, and that they do perform *graph* reduction. Clean, a similar language to I'm not sure that programmers ought to be relying on this rule. Sure, all Haskell compilers I know of preserve sharing and do graph reduction. But conventional wisdom is not the same thing as an unwritten rule. Someday, someone might come along and write a Haskell compiler that isn't based on graph reduction and doesn't preserve sharing at the implementation level (while still preserving the informal semantics of Haskell). A programmer who had written code that failed to compile correctly under this hypothetical compiler would be a very naughty Haskell programmer indeed. Haskell, indeed has a semantics based on graphs. So I don't believe Haskell doesn't have a semantics, graph-based or not... or at least not a formal one, and if not a formal one, I don't know what you mean :-) Cheers, Tim -- Tim Chevalier * http://cs.pdx.edu/~tjc * Often in error, never in doubt There are no sexist decisions to be made. There are antisexist decisions to be made. And they require tremendous energy and self-scrutiny, as well as moral stamina... -- Samuel R. Delany ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
tomahawkins: On 2/8/08, Emil Axelsson [EMAIL PROTECTED] wrote: I know of a few of ways to express sharing in a pure language: 1) Observable sharing, which, in general, is unsafe. 2) Using Template Haskell 3) Matthew Naylor has done some work on expressible sharing, which has 4) Use a monad (but I'm sure this is what you're trying to avoid). Or... 5) Forget embedding the DSL, and write a direct compiler. In addition to the sharing problem, another shortcoming of Haskell DSLs is they can not fully exploit the benefits of algebraic datatypes. Specifically, pattern matching ADTs can only be used to control the compile-time configuration of the target, it can't be used to describe the target's behavior -- at least for DSLs that generate code that executes outside of Haskell's runtime. Writing a real compiler would solve both of these problems. Is there any Haskell implementation that has a clean cut-point, from which I can start from a fully type-checked, type-annotated intermediate representation? Taking the output of GHC's intermediate phase, after optimising leaves you with type checked, optimised, 'Core' -- basically lambda calculus with extras. It's a good start if you then want to hand-compile that down. Extract it with -fext-core -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
As I pointed out a few days ago in another thread, you can benefit from using Observable sharing [1] Be warned that Observable sharing is a non-conservative extension of Haskell and it breaks referential transparency. [1] http://www.cs.chalmers.se/~koen/pubs/entry-asian99-lava.html On Feb 8, 2008 7:33 AM, Tom Hawkins [EMAIL PROTECTED] wrote: I've been programming with Haskell for a few years and love it. One of my favorite applications of Haskell is using for domain specific languages. However, after designing a handful of DSLs, I continue to hit what appears to be a fundamental hurdle -- or at least I have yet to find an adequate solution. My DSLs invariably define a datatype to capture expressions; something like this: data Expression = Add Expression Expression | Sub Expression Expression | Variable String | Constant Int deriving Eq Using the datatype Expression, it is easy to mass a collections of functions to help assemble complex expressions, which leads to very concise programs in the DSL. The problem comes when I want to generate efficient code from an Expression (ie. to C or some other target language). The method I use invovles converting the tree of subexpressions into an acyclic graphic to eliminate common subexpressions. The nodes are then topologically ordered and assigned an instruction, or statement for each node. For example: let a = Add (Constant 10) (Variable i1) b = Sub (Variable i2) (Constant 2) c = Add a b would compile to a C program that may look like this: a = 10 + i1; b = i2 - 2; c = a + b; The process of converting an expression tree to a graph uses either Eq or Ord (either derived or a custom instance) to search and build a set of unique nodes to be ordered for execution. In this case a, then b, then c. The problem is expressions often have shared, equivalent subnodes, which dramatically grows the size of the tree. For example: let d = Add c c e = Add d d-- e now as 16 leaf nodes. As these trees grow in size, the equality comparison in graph construction quickly becomes the bottleneck for DSL compilation. What's worse, the phase transition from tractable to intractable is very sharp. In one of my DSL programs, I made a seemingly small change, and compilation time went from milliseconds to not-in-a-million-years. Prior to Haskell, I wrote a few DSLs in OCaml. I didn't have this problem in OCaml because each let expression was mutable, and I could use the physical equality operator to perform fast comparisons. Unfortunately, I have grown to love Haskell's type system and its lack of side effects, and could never go back. Is there anything that can be done to dramatically speed up comparisons, or is there a better approach I can take to extract common subexpressions? I should point out I have an opportunity to get Haskell on a real industrial application. But if I can't solve this problem, I may have to resort to far less eloquent languages. :-( Thanks for any and all help. -Tom ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I love purity, but it's killing me.
I know of a few of ways to express sharing in a pure language: 1) Observable sharing, which, in general, is unsafe. http://www.cs.chalmers.se/~koen/pubs/entry-asian99-lava.html 2) Using Template Haskell http://www.dcs.gla.ac.uk/publications/PAPERS/7524/EmbedHDLinTH.ps 3) Matthew Naylor has done some work on expressible sharing, which has advantages over both methods above. I don't have any reference though... 4) Use a monad (but I'm sure this is what you're trying to avoid). / Emil On 2008-02-08 07:33, Tom Hawkins wrote: I've been programming with Haskell for a few years and love it. One of my favorite applications of Haskell is using for domain specific languages. However, after designing a handful of DSLs, I continue to hit what appears to be a fundamental hurdle -- or at least I have yet to find an adequate solution. My DSLs invariably define a datatype to capture expressions; something like this: data Expression = Add Expression Expression | Sub Expression Expression | Variable String | Constant Int deriving Eq Using the datatype Expression, it is easy to mass a collections of functions to help assemble complex expressions, which leads to very concise programs in the DSL. The problem comes when I want to generate efficient code from an Expression (ie. to C or some other target language). The method I use invovles converting the tree of subexpressions into an acyclic graphic to eliminate common subexpressions. The nodes are then topologically ordered and assigned an instruction, or statement for each node. For example: let a = Add (Constant 10) (Variable i1) b = Sub (Variable i2) (Constant 2) c = Add a b would compile to a C program that may look like this: a = 10 + i1; b = i2 - 2; c = a + b; The process of converting an expression tree to a graph uses either Eq or Ord (either derived or a custom instance) to search and build a set of unique nodes to be ordered for execution. In this case a, then b, then c. The problem is expressions often have shared, equivalent subnodes, which dramatically grows the size of the tree. For example: let d = Add c c e = Add d d-- e now as 16 leaf nodes. As these trees grow in size, the equality comparison in graph construction quickly becomes the bottleneck for DSL compilation. What's worse, the phase transition from tractable to intractable is very sharp. In one of my DSL programs, I made a seemingly small change, and compilation time went from milliseconds to not-in-a-million-years. Prior to Haskell, I wrote a few DSLs in OCaml. I didn't have this problem in OCaml because each let expression was mutable, and I could use the physical equality operator to perform fast comparisons. Unfortunately, I have grown to love Haskell's type system and its lack of side effects, and could never go back. Is there anything that can be done to dramatically speed up comparisons, or is there a better approach I can take to extract common subexpressions? I should point out I have an opportunity to get Haskell on a real industrial application. But if I can't solve this problem, I may have to resort to far less eloquent languages. :-( Thanks for any and all help. -Tom ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe