Re: Fast Haskell Parser

John D. Earle Thu, 11 Mar 2010 05:11:51 -0800

Simon,

Happy is a domain specific language (DSL) and as such it would likely makestating the problem simpler. A Turing complete language will admitpathological cases and convoluted expression, both of which are forms ofpathology as automation is concerned. So what you said makes sense to me.Though Parsec cannot be Turing complete as a matter of design it uses aTuring complete language to achieve its goals. This invites opportunities ofsyntactic complexity and convoluted expression. Only the omission ofpathological cases was likely addressed.

Suppose someone thought to create a Haskell compiler using Parsec andsuppose this caused the compiler to take twice as long to compile comparedto GHC. Would such a compiler be competitive with GHC? It would put theproduct in retrograde if this was thought an improvement. This speaks to therisk involved. I do not anticipate to be working with ad-hoc or especiallyambiguous grammars. The grammars are context free grammars (CFG).

I hope to create something that will parse a make file which would becapable of becoming a product in its own right though this is not my longterm goal. Slow is not on the menu. Make has a reputation for speed. This iswhat I hope to achieve in phase one and there is method to my madness.

Why is this important? It will create a guarantee. The resulting productwill be no less expressive than what it intends to replace. It will alsohelp in phase two where it will become necessary to explain to those who arefamiliar with make files how things are done down under. This could be shownthrough comparison and contrast which is something I will become familiarwith in phase two.

What is opaque to me at the moment is how do these parser generators achievetheir speed? They have to be emitting optimized Haskell. I do not understandhow this is achieved at present. My guess is they are side stepping theoptimizations that would ordinarily be carried out by the compiler usingtheir own set of domain specific optimizations.


--------------------------------------------------
From: "Simon Marlow" <marlo...@gmail.com>
Sent: 11 Thursday March 2010 0328
To: "John D. Earle" <johndea...@cox.net>
Cc: "cvs-ghc" <cvs-ghc@haskell.org>
Subject: Re: Fast Haskell Parser

On 11/03/2010 01:25, John D. Earle wrote:

Hi, Ben! Thanks for the input. I went to the Parsec and Attoparsec
parser links. Attoparsec was new to me. From the Parsec link:

Combinator parsers are written and used within the same programming
language as the rest of the program. The parsers are first-class
citizens of the language, unlike Happy parsers, which must be generated
via a preprocessor.

End Quote

I may want to go with Parsec or cousin just because the approach seems
elegant. What I am fishing for is something you only learn from
experience or learn from talking to people who have experience. My
impression is if you want speed, Happy is the way to go. I will be
making an investment in one or the other paradigm so I need to
understand the relative merits of each in order to make a decision.

I would think Happy is a dark place whereas Parsec is a place of light
which becomes important as correctness that I can personally attest to
is concerned. What I am getting at is my impression is with Parsec the
concepts involved are more important than the actual code itself. I
suspect with Happy you could understand the concepts involved and the
tool will continue to be necessary. I value the educational experience.
Parsec or cousin may provide a better educational experience. Is Parsec
slow like a snail compared to Happy? or are they similar? Parsec
certainly seems more flexible, but my question concerns performance. You
know the embarrassing facts that you won't find in the brochure.

Happy/Alex tend to work well for programming-language type tasks wherethere is a clear lexical syntax and the grammar is close to LALR(1).Parsec works well for more ad-hoc grammars, and where there is ambiguity;Parsec tends to be more flexible in that regard. Happy on the other handwill guarantee that your grammar has no ambiguity, and Alex will be moreefficient than parsing the equivalent regular expressions in a combinatorlibrary because it is doing the NFA->DFA conversion beforehand (althoughIIRC the Utrecht combinator library does this at runtime?).

Personally I find Parsec and the other parser combinator libraries quitedifficult to use when it comes to deciding where to put 'try'; ReadP isthe exception here, because it does general backtracking and doesn't makeyou decide where to use 'try'. I had an interesting experience withwriting GHC's parser for the strings inside foreign import declarationsrecently - I tried various different combinator libraries to see how itcame out, and none of them made it easy. I must write a blog post aboutthat sometime.

As for performance, I don't think anyone has done a rigorous comparison -it's hard to do, becuase you have to implement a whole parser twice invery different ways, and then you only get results for one grammar. Alexand Happy are fast enough for GHC - parsing is almost never thebottleneck.


Cheers,
Simon

Thank you this has helped clarify my thinking.

--------------------------------------------------
From: "Ben Lippmeier" <b...@ouroborus.net>
Sent: 10 Wednesday March 2010 1734
To: "John D. Earle" <johndea...@cox.net>
Cc: <haskell-c...@haskell.org>
Subject: Re: Fast Haskell Parser


Hi John,
Doing a Google search for "haskell parser" returns the following link
as its first result. That's the parser that GHC uses.

http://www.haskell.org/happy/

You could also check out the following:

http://www.haskell.org/haskellwiki/Parsec
http://hackage.haskell.org/package/attoparsec

This would also be a perfect question to ask on the haskell-cafe
mailing list...

Cheers,
Ben.


On 11/03/2010, at 10:39 AM, John D. Earle wrote:

I was thinking of ways to create an efficient Haskell parser. My
initial thinking was to write a top down parser in Haskell, but if
you want speed a table driven approach may make greater sense.

Due to the existence of build bots there is a certain degree of
compliancy concerning build times. I feel that convenience is an
important factor. It should be convenient to build the source. Build
bots make an assumption, namely the existence of a formal
infrastructure. I believe that it should be possible to build
something from source casually.

This is a less demanding goal than high performance incremental
builds. It would be nice to out perform make files because if you
fail to do this, can it really be said that you are making progress?
Correctness appears to be a low priority among computer programmers.
That said, it may be worth investing some time in advance to figuring
out how to best achieve both objectives, namely correctness and
performance. Who knows skills acquired in one project may be useful
in another and performance is usually welcome.

So my question is, What sort of tools and methodologies exist in
Haskell to create high performance parsers? My impression is the
speed at which the parser performs its task is not the bottle-neck,
but the parser might as well be designed to be efficient so as not to
be intellectually lazy. It may even turn out that the parser may need
to be efficient merely to compensate for the spawn of correctness,
namely slow builds.
_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc


_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc


_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc

Re: Fast Haskell Parser

Reply via email to