(I'll be brief because my head is hurting, but please don't interpret that as an intent to offend)
A few points: 1) Capture groups are all you need to do some meaningful interpretation of data; these were around long before perl. 2) Yacc is typically used in conjunction with lex, partly for (a) efficiency and partly for (b) ease of use (compared to writing out [a-z] as production rules). 3) I've actually used lex without yacc (well, flex without bison) when faced with dealing with a language that's regular (and easy enough to express that way - cf. an enormous finite subset of a context-free language). 2b is mostly irrelevant in Haskell, as Parsec already provides functions that can easily match the same things a regexp would. 2a, if it stands up to testing, is the best argument for ripping things apart in Haskell using a DFA. Parsec and cousins are efficient, but it's hard to beat a single table lookup per character. The questions are 1) is the difference enough to matter in many cases, and 2) is there a way to get this out of parsec without touching regexps? (It's not impossible that parsec already recognizes when a language is regular, although I'd be weakly surprised). On Thu, Feb 14, 2013 at 3:07 PM, wren ng thornton <w...@freegeek.org> wrote: > On 2/13/13 11:18 PM, wren ng thornton wrote: > >> On 2/13/13 11:32 AM, Nicolas Bock wrote: >> >>> Since I have very little experience with Haskell and am not used to >>> Haskell-think yet, I don't quite understand your statement that >>> regexes are >>> seen as foreign to Haskell-think. Could you elaborate? What would a more >>> "native" solution look like? From what I have learned so far, it seems to >>> me that Haskell is a lot about clear, concise, and well structured >>> code. I >>> find regexes extremely compact and powerful, allowing for very concise >>> code, which should fit the bill perfectly, or shouldn't it? >>> >> >> Regexes are powerful and concise for recognizing regular languages. They >> are not, however, very good for *parsing* regular languages; nor can >> they handle non-regular languages (unless you're relying on the badness >> of pcre). In other languages people press regexes into service for >> parsing because the alternative is using an external DSL like lex/yacc, >> javaCC, etc. Whereas, in Haskell, we have powerful and concise tools for >> parsing context-free languages and beyond (e.g., parsec, attoparsec). >> > > > Just to be clear, the problem isn't that proper regexes are only good for > regular languages (many files have regular syntax afterall). The problem is > that regexes are only good for recognition. They're an excellent tool for > deciding whether a given string is "good" or "bad"; but they're completely > unsuitable for the task of parsing/interpreting a string into some > structure or semantic response. If you've ever used tools like yacc or > javaCC, one of the crucial things they offer is the ability to add these > semantic responses. Parser combinator libraries in Haskell are similar, > since the string processing is integrated into a programming language so we > can say things like: > > myParser = do > x <- blah > guard (p x) > y <- blargh > return (f x y) > > where p and f can be an arbitrary Haskell functions. Perl extends on > regular expressions to try and do things like this, but it's extremely > baroque, hard to get right, and impossible to maintain. (N.B., I was raised > on Perl and still love it.) And at some point we have to call into question > the idea of regexes as an embedded DSL when we then turn around and try to > have Perl be a DSL embedded into the regex language. > > One of the big things that makes regexes so nice is that they identify > crucial combinators like choice and repetition. However, once those > combinators have been identified, we can just offer them directly as > functions in the host language. No need for a special DSL or special > syntax. The big trick is doing this efficiently. Parser combinators were an > academic curiosity for a long time until Parsec came around and made them > efficient. And we've come a long way since then: with things like > attoparsec, PEG parsing, and non-monadic applicative parsers (which can > perform more optimizations because they can identify the structure of the > grammar). > > The theory of regular expressions is indeed beautiful and elegant. > However, it's a theory of recognition, not a theory of parsing; and that's > a crucial distinction. Haskell is about clear, concise, and well-structured > code; but to be clear, concise, and well-structured we have to choose the > right tool for the job. > > > -- > Live well, > ~wren > > ______________________________**_________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/**mailman/listinfo/haskell-cafe<http://www.haskell.org/mailman/listinfo/haskell-cafe> >
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe