Simon,
Happy is a domain specific language (DSL) and as such it would likely make
stating the problem simpler. A Turing complete language will admit
pathological cases and convoluted expression, both of which are forms of
pathology as automation is concerned. So what you said makes sense to me.
Though Parsec cannot be Turing complete as a matter of design it uses a
Turing complete language to achieve its goals. This invites opportunities of
syntactic complexity and convoluted expression. Only the omission of
pathological cases was likely addressed.
Suppose someone thought to create a Haskell compiler using Parsec and
suppose this caused the compiler to take twice as long to compile compared
to GHC. Would such a compiler be competitive with GHC? It would put the
product in retrograde if this was thought an improvement. This speaks to the
risk involved. I do not anticipate to be working with ad-hoc or especially
ambiguous grammars. The grammars are context free grammars (CFG).
I hope to create something that will parse a make file which would be
capable of becoming a product in its own right though this is not my long
term goal. Slow is not on the menu. Make has a reputation for speed. This is
what I hope to achieve in phase one and there is method to my madness.
Why is this important? It will create a guarantee. The resulting product
will be no less expressive than what it intends to replace. It will also
help in phase two where it will become necessary to explain to those who are
familiar with make files how things are done down under. This could be shown
through comparison and contrast which is something I will become familiar
with in phase two.
What is opaque to me at the moment is how do these parser generators achieve
their speed? They have to be emitting optimized Haskell. I do not understand
how this is achieved at present. My guess is they are side stepping the
optimizations that would ordinarily be carried out by the compiler using
their own set of domain specific optimizations.
--------------------------------------------------
From: "Simon Marlow" <marlo...@gmail.com>
Sent: 11 Thursday March 2010 0328
To: "John D. Earle" <johndea...@cox.net>
Cc: "cvs-ghc" <cvs-ghc@haskell.org>
Subject: Re: Fast Haskell Parser
On 11/03/2010 01:25, John D. Earle wrote:
Hi, Ben! Thanks for the input. I went to the Parsec and Attoparsec
parser links. Attoparsec was new to me. From the Parsec link:
Combinator parsers are written and used within the same programming
language as the rest of the program. The parsers are first-class
citizens of the language, unlike Happy parsers, which must be generated
via a preprocessor.
End Quote
I may want to go with Parsec or cousin just because the approach seems
elegant. What I am fishing for is something you only learn from
experience or learn from talking to people who have experience. My
impression is if you want speed, Happy is the way to go. I will be
making an investment in one or the other paradigm so I need to
understand the relative merits of each in order to make a decision.
I would think Happy is a dark place whereas Parsec is a place of light
which becomes important as correctness that I can personally attest to
is concerned. What I am getting at is my impression is with Parsec the
concepts involved are more important than the actual code itself. I
suspect with Happy you could understand the concepts involved and the
tool will continue to be necessary. I value the educational experience.
Parsec or cousin may provide a better educational experience. Is Parsec
slow like a snail compared to Happy? or are they similar? Parsec
certainly seems more flexible, but my question concerns performance. You
know the embarrassing facts that you won't find in the brochure.
Happy/Alex tend to work well for programming-language type tasks where
there is a clear lexical syntax and the grammar is close to LALR(1).
Parsec works well for more ad-hoc grammars, and where there is ambiguity;
Parsec tends to be more flexible in that regard. Happy on the other hand
will guarantee that your grammar has no ambiguity, and Alex will be more
efficient than parsing the equivalent regular expressions in a combinator
library because it is doing the NFA->DFA conversion beforehand (although
IIRC the Utrecht combinator library does this at runtime?).
Personally I find Parsec and the other parser combinator libraries quite
difficult to use when it comes to deciding where to put 'try'; ReadP is
the exception here, because it does general backtracking and doesn't make
you decide where to use 'try'. I had an interesting experience with
writing GHC's parser for the strings inside foreign import declarations
recently - I tried various different combinator libraries to see how it
came out, and none of them made it easy. I must write a blog post about
that sometime.
As for performance, I don't think anyone has done a rigorous comparison -
it's hard to do, becuase you have to implement a whole parser twice in
very different ways, and then you only get results for one grammar. Alex
and Happy are fast enough for GHC - parsing is almost never the
bottleneck.
Cheers,
Simon
Thank you this has helped clarify my thinking.
--------------------------------------------------
From: "Ben Lippmeier" <b...@ouroborus.net>
Sent: 10 Wednesday March 2010 1734
To: "John D. Earle" <johndea...@cox.net>
Cc: <haskell-c...@haskell.org>
Subject: Re: Fast Haskell Parser
Hi John,
Doing a Google search for "haskell parser" returns the following link
as its first result. That's the parser that GHC uses.
http://www.haskell.org/happy/
You could also check out the following:
http://www.haskell.org/haskellwiki/Parsec
http://hackage.haskell.org/package/attoparsec
This would also be a perfect question to ask on the haskell-cafe
mailing list...
Cheers,
Ben.
On 11/03/2010, at 10:39 AM, John D. Earle wrote:
I was thinking of ways to create an efficient Haskell parser. My
initial thinking was to write a top down parser in Haskell, but if
you want speed a table driven approach may make greater sense.
Due to the existence of build bots there is a certain degree of
compliancy concerning build times. I feel that convenience is an
important factor. It should be convenient to build the source. Build
bots make an assumption, namely the existence of a formal
infrastructure. I believe that it should be possible to build
something from source casually.
This is a less demanding goal than high performance incremental
builds. It would be nice to out perform make files because if you
fail to do this, can it really be said that you are making progress?
Correctness appears to be a low priority among computer programmers.
That said, it may be worth investing some time in advance to figuring
out how to best achieve both objectives, namely correctness and
performance. Who knows skills acquired in one project may be useful
in another and performance is usually welcome.
So my question is, What sort of tools and methodologies exist in
Haskell to create high performance parsers? My impression is the
speed at which the parser performs its task is not the bottle-neck,
but the parser might as well be designed to be efficient so as not to
be intellectually lazy. It may even turn out that the parser may need
to be efficient merely to compensate for the spawn of correctness,
namely slow builds.
_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc
_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc
_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc