Re: [Haskell-cafe] performance question

wren ng thornton Thu, 14 Feb 2013 15:09:07 -0800

On 2/13/13 11:18 PM, wren ng thornton wrote:

On 2/13/13 11:32 AM, Nicolas Bock wrote:

Since I have very little experience with Haskell and am not used to
Haskell-think yet, I don't quite understand your statement that
regexes are
seen as foreign to Haskell-think. Could you elaborate? What would a more
"native" solution look like? From what I have learned so far, it seems to
me that Haskell is a lot about clear, concise, and well structured
code. I
find regexes extremely compact and powerful, allowing for very concise
code, which should fit the bill perfectly, or shouldn't it?


Regexes are powerful and concise for recognizing regular languages. They
are not, however, very good for *parsing* regular languages; nor can
they handle non-regular languages (unless you're relying on the badness
of pcre). In other languages people press regexes into service for
parsing because the alternative is using an external DSL like lex/yacc,
javaCC, etc. Whereas, in Haskell, we have powerful and concise tools for
parsing context-free languages and beyond (e.g., parsec, attoparsec).

Just to be clear, the problem isn't that proper regexes are only goodfor regular languages (many files have regular syntax afterall). Theproblem is that regexes are only good for recognition. They're anexcellent tool for deciding whether a given string is "good" or "bad";but they're completely unsuitable for the task of parsing/interpreting astring into some structure or semantic response. If you've ever usedtools like yacc or javaCC, one of the crucial things they offer is theability to add these semantic responses. Parser combinator libraries inHaskell are similar, since the string processing is integrated into aprogramming language so we can say things like:


    myParser = do
        x <- blah
        guard (p x)
        y <- blargh
        return (f x y)

where p and f can be an arbitrary Haskell functions. Perl extends onregular expressions to try and do things like this, but it's extremelybaroque, hard to get right, and impossible to maintain. (N.B., I wasraised on Perl and still love it.) And at some point we have to callinto question the idea of regexes as an embedded DSL when we then turnaround and try to have Perl be a DSL embedded into the regex language.

One of the big things that makes regexes so nice is that they identifycrucial combinators like choice and repetition. However, once thosecombinators have been identified, we can just offer them directly asfunctions in the host language. No need for a special DSL or specialsyntax. The big trick is doing this efficiently. Parser combinators werean academic curiosity for a long time until Parsec came around and madethem efficient. And we've come a long way since then: with things likeattoparsec, PEG parsing, and non-monadic applicative parsers (which canperform more optimizations because they can identify the structure ofthe grammar).

The theory of regular expressions is indeed beautiful and elegant.However, it's a theory of recognition, not a theory of parsing; andthat's a crucial distinction. Haskell is about clear, concise, andwell-structured code; but to be clear, concise, and well-structured wehave to choose the right tool for the job.


--
Live well,
~wren

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] performance question

Reply via email to