> >         - parsing is quick
> I'll bet you 1 pint that MLj's parser+lexer are quicker on 
> correct files
> than GHC's.  Deal?

Ok.  It's going to be hard to get a fair comparison here, but I've just done
a rough measurement on GHC:

~/nightly/boot-boot-i386-unknown-linux/ghc/compiler > wc -l parser/Parser.hs
   9743 parser/Parser.hs
~/nightly/boot-boot-i386-unknown-linux/ghc/compiler > time
/home/simonmar/nightly/boot-i386-unknown-linux/ghc/driver/../compiler/hsc
/tmp/ghc17836.cpp  -fglasgow-exts -fignore-interface-pragmas
-fomit-interface-pragmas -fsimplify [ -finline-phase2
-fmax-simplifier-iterations4 ]   -fwarn-overlapping-patterns
-fwarn-missing-methods -fwarn-duplicate-exports -fhi-version=404 -static
-himap=utils%.hi:basicTypes%.hi:types%.hi:hsSyn%.hi:prelude%.hi:rename%.hi:t
ypecheck%.hi:deSugar%.hi:coreSyn%.hi:specialise%.hi:simplCore%.hi:stranal%.h
i:stgSyn%.hi:simplStg%.hi:codeGen%.hi:absCSyn%.hi:main%.hi:profiling%.hi:par
ser%.hi:usageSP%.hi:cprAnalysis%.hi:nativeGen%.hi:.%.hi:/home/simonmar/night
ly/boot-i386-unknown-linux/ghc/driver/../lib/exts%.hi:/home/simonmar/nightly
/boot-i386-unknown-linux/ghc/driver/../lib/exts%.hi:/home/simonmar/nightly/b
oot-i386-unknown-linux/ghc/driver/../lib/std%.hi -dcore-lint   -v
-hifile=/tmp/ghc17836.hi -C=/tmp/ghc17836.hc -F=/tmp/ghc17836_stb.c
-FH=/tmp/ghc17836_stb.h +RTS -H64000000 -K2000000 -S/tmp/ghc17836.stat
Glasgow Haskell Compiler, version 4.04, for Haskell 98, compiled by GHC
version 4.04

end
zsh: 18569 exit 1
/home/simonmar/nightly/boot-i386-unknown-linux/ghc/driver/../compiler/hsc

1.75s real   1.38s user   0.37s system   99%
/home/simonmar/nightly/boot-i386-unknown-linux/ghc/driver/../compiler/hsc


Which, roughly speaking, is about 1 line every 0.14ms.  I think that's about
a factor of 10 off what you'd get with lex/yacc in C, but I'm fairly happy
with the result.  Besides, I bet about 90% of that time is spent in the
lexer rather than the parser.  I could get a decent speedup by having Happy
generate C tables for the parser rather than the Haskell state machine
(actually, I'm seriously tempted to do this anyway).

This is a recent ghc-4.04 on a 400MHz PII w/ Linux, hacked to stop after
parsing, BTW.


> >         - error-correction is by definition unreliable
> Yes of course.  Any intelligent user of a compiler knows that 
> if they have
> two parsing messages close to each other, the second may well 
> be wrong.
> But it shouldn't happen too often if the parser is well written.
> >         - error-correction is hard to implement well
> > 
> > you might think a good policy is "if you find a parse 
> error, just read up to
> > the next semicolon & carry on", but layout totally screws 
> this idea.  You
> > often can't tell whether the next definition belongs to the 
> current scope or
> > some other scope, and if you get out of step everything goes wrong.
> See how ml-yacc does it.  
>    http://cm.bell-labs.com/cm/cs/what/smlnj/doc/ML-Yacc/
> It tries quite hard to find corrections involving the minimum 
> amount of
> correction of recently-read tokens.  It seems to work quite well.  The
> thing is you can afford to work quite hard doing error-correction on
> incorrect programs, since they don't occur very often.  The 
> only disadvantage
> is that it makes the parser very slow when fed total bilge 
> unless you put
> in some kind of cut-out.  (Thus MLj's parser took forever when fed the
> last Labour manifesto.)
> 
> Scoping would be a problem if you tried to do anything more 
> to incorrect
> programs after you've parsed them.  Actually I don't think 
> that's a good idea,
> so scoping is not a problem.

So what we're talking about is just parser recovery?  Not even passing the
corrected output through the rest of the front end?  Then I'm even less
convinced.  Given that the longest you have to wait for a parse error on any
human-written Haskell file is at the most 1 second, what's the point?

Actually, I think the best solution is to have a parser that runs in the
background while you're editing, so you'll know if the program is at least
syntactically correct before even running the compiler.

> Yes, that's what I think needs to be done.  MLj actually has 
> the location
> (line + character) of the start and end of almost every 
> syntactic construct.  
> This is not really that expensive.  I stole SML/NJ's trick of 
> recording the
> character number in the file (rather than just the line 
> number, or a pair
> of line and column number); if you put in some extra magic in 
> the lexer to
> remember when the lines end, you can very easily get a real 
> line and column
> number back out.

Ok, one for the wish list.

> Parser combinators are fine if the grammar is very simple or you don't
> care about CPU times.  But using them in a serious compiler 
> for Haskell
> would be like building a computer out of stone knives and bearskins.

It was you who suggested the parser be rewritten.  So what's the
alternative?

Simon

Reply via email to