Le 26/05/2010 17:48, Brian Hurt a écrit :


On Wed, May 26, 2010 at 11:27 AM, Attila Szegedi <[email protected] <mailto:[email protected]>> wrote:


    Yeah, but my point is that you can use *any* character after Q to
    specify the delimiter, i.e.:

     %Qafooa

    is equal to

     "foo"

    so your lexer must be ready to use as a terminating character
    whatever character follows immediately after %Q. It's just that
    it'll have special cases for some chars - most notably
    parentheses, brackets, and braces, so a string starting with %Q{
    will not be terminated by { but by }. Most people will use { and
    }, but the point is that you are free to use *any* character.


I'm not sure this feature is doable in a sane way in a classic lex/yacc parser. I think you can, in classic lex, drop down to a hand-rolled lexer if you need to, but this is a serious code smell.

However, this misses the point of my original post. If you're parsing an existing language, then you don't get a choice in features. If you're parsing Ruby, you can't choose to not implement this feature. Thus, if lex and yacc can't handle this feature, you can't use lex and yacc to parse Ruby. This is where fancier parsers with more features and greater ability to handle weird syntaxes become really useful.

If you're creating a language, you have a choice- you can include hard to parse features or not. And the thing to remember is that there is a cost to adding the features- every hard to parse feature you add reduces the number of parsers for your language other people are willing to write- and thus limiting the portability of the language, limiting the number of tools for the language, etc. Even worse, every one of these features you add increases the likelihood that people who do implement other parsers for your language get it subtly wrong. Add enough of these features and there will only ever be one parser for your language- yours.

This isn't to say that you shouldn't add these features- it's that you should be aware of the trade offs you are making, and be making them deliberately and not accidentally. With more powerful/flexible parser generators, it's much easier to add these sorts of features accidentally, and paint yourself into a corner (and it's even more likely you will do this if you're implementing a hand-written parser and not using a parser generator at all).

Brian

In the same time, users don't want to understand parsers/grammar limitations.
Users want things like optional semicolon like in Javascript or Groovy,
Generics Foo<> or XML literals: <foo/> and traditional less than: a < foo,
XPath query literals: document//node/* but also // to specify a comment,
HTTP literals and ?: expression, etc.

So there is a tension between having a parseable by any tools syntax and
be able to have some nice constructions in the language.

Rémi

--
You received this message because you are subscribed to the Google Groups "JVM 
Languages" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/jvm-languages?hl=en.

Reply via email to