Simon,

Happy is a domain specific language (DSL) and as such it would likely make stating the problem simpler. A Turing complete language will admit pathological cases and convoluted expression, both of which are forms of pathology as automation is concerned. So what you said makes sense to me. Though Parsec cannot be Turing complete as a matter of design it uses a Turing complete language to achieve its goals. This invites opportunities of syntactic complexity and convoluted expression. Only the omission of pathological cases was likely addressed.

Suppose someone thought to create a Haskell compiler using Parsec and suppose this caused the compiler to take twice as long to compile compared to GHC. Would such a compiler be competitive with GHC? It would put the product in retrograde if this was thought an improvement. This speaks to the risk involved. I do not anticipate to be working with ad-hoc or especially ambiguous grammars. The grammars are context free grammars (CFG).

I hope to create something that will parse a make file which would be capable of becoming a product in its own right though this is not my long term goal. Slow is not on the menu. Make has a reputation for speed. This is what I hope to achieve in phase one and there is method to my madness.

Why is this important? It will create a guarantee. The resulting product will be no less expressive than what it intends to replace. It will also help in phase two where it will become necessary to explain to those who are familiar with make files how things are done down under. This could be shown through comparison and contrast which is something I will become familiar with in phase two.

What is opaque to me at the moment is how do these parser generators achieve their speed? They have to be emitting optimized Haskell. I do not understand how this is achieved at present. My guess is they are side stepping the optimizations that would ordinarily be carried out by the compiler using their own set of domain specific optimizations.

--------------------------------------------------
From: "Simon Marlow" <marlo...@gmail.com>
Sent: 11 Thursday March 2010 0328
To: "John D. Earle" <johndea...@cox.net>
Cc: "cvs-ghc" <cvs-ghc@haskell.org>
Subject: Re: Fast Haskell Parser

On 11/03/2010 01:25, John D. Earle wrote:
Hi, Ben! Thanks for the input. I went to the Parsec and Attoparsec
parser links. Attoparsec was new to me. From the Parsec link:

Combinator parsers are written and used within the same programming
language as the rest of the program. The parsers are first-class
citizens of the language, unlike Happy parsers, which must be generated
via a preprocessor.

End Quote

I may want to go with Parsec or cousin just because the approach seems
elegant. What I am fishing for is something you only learn from
experience or learn from talking to people who have experience. My
impression is if you want speed, Happy is the way to go. I will be
making an investment in one or the other paradigm so I need to
understand the relative merits of each in order to make a decision.

I would think Happy is a dark place whereas Parsec is a place of light
which becomes important as correctness that I can personally attest to
is concerned. What I am getting at is my impression is with Parsec the
concepts involved are more important than the actual code itself. I
suspect with Happy you could understand the concepts involved and the
tool will continue to be necessary. I value the educational experience.
Parsec or cousin may provide a better educational experience. Is Parsec
slow like a snail compared to Happy? or are they similar? Parsec
certainly seems more flexible, but my question concerns performance. You
know the embarrassing facts that you won't find in the brochure.

Happy/Alex tend to work well for programming-language type tasks where there is a clear lexical syntax and the grammar is close to LALR(1). Parsec works well for more ad-hoc grammars, and where there is ambiguity; Parsec tends to be more flexible in that regard. Happy on the other hand will guarantee that your grammar has no ambiguity, and Alex will be more efficient than parsing the equivalent regular expressions in a combinator library because it is doing the NFA->DFA conversion beforehand (although IIRC the Utrecht combinator library does this at runtime?).

Personally I find Parsec and the other parser combinator libraries quite difficult to use when it comes to deciding where to put 'try'; ReadP is the exception here, because it does general backtracking and doesn't make you decide where to use 'try'. I had an interesting experience with writing GHC's parser for the strings inside foreign import declarations recently - I tried various different combinator libraries to see how it came out, and none of them made it easy. I must write a blog post about that sometime.

As for performance, I don't think anyone has done a rigorous comparison - it's hard to do, becuase you have to implement a whole parser twice in very different ways, and then you only get results for one grammar. Alex and Happy are fast enough for GHC - parsing is almost never the bottleneck.

Cheers,
Simon

Thank you this has helped clarify my thinking.

--------------------------------------------------
From: "Ben Lippmeier" <b...@ouroborus.net>
Sent: 10 Wednesday March 2010 1734
To: "John D. Earle" <johndea...@cox.net>
Cc: <haskell-c...@haskell.org>
Subject: Re: Fast Haskell Parser


Hi John,
Doing a Google search for "haskell parser" returns the following link
as its first result. That's the parser that GHC uses.

http://www.haskell.org/happy/

You could also check out the following:

http://www.haskell.org/haskellwiki/Parsec
http://hackage.haskell.org/package/attoparsec

This would also be a perfect question to ask on the haskell-cafe
mailing list...

Cheers,
Ben.


On 11/03/2010, at 10:39 AM, John D. Earle wrote:

I was thinking of ways to create an efficient Haskell parser. My
initial thinking was to write a top down parser in Haskell, but if
you want speed a table driven approach may make greater sense.

Due to the existence of build bots there is a certain degree of
compliancy concerning build times. I feel that convenience is an
important factor. It should be convenient to build the source. Build
bots make an assumption, namely the existence of a formal
infrastructure. I believe that it should be possible to build
something from source casually.

This is a less demanding goal than high performance incremental
builds. It would be nice to out perform make files because if you
fail to do this, can it really be said that you are making progress?
Correctness appears to be a low priority among computer programmers.
That said, it may be worth investing some time in advance to figuring
out how to best achieve both objectives, namely correctness and
performance. Who knows skills acquired in one project may be useful
in another and performance is usually welcome.

So my question is, What sort of tools and methodologies exist in
Haskell to create high performance parsers? My impression is the
speed at which the parser performs its task is not the bottle-neck,
but the parser might as well be designed to be efficient so as not to
be intellectually lazy. It may even turn out that the parser may need
to be efficient merely to compensate for the spawn of correctness,
namely slow builds.
_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc


_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc


_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc

Reply via email to