Sunday, August 5, 2018, 6:58:11 PM, Stephan Müller wrote:
> Am 04.07.2018 um 19:28 schrieb Daniel Dekany:
>> I wonder what parser libraries could help us, in FM3, to separate the
>> expression language parsing from the top-level language (like
>> `<#foo>`, `${...}`, etc.) parsing. Or if a hand written parsers is an
>> acceptable compromise. It would be good if we can change the top-level
>> syntax and still reuse the expression syntax. (Or, replace the
>> expression syntax, and reuse the top-level one.) Like, somebody wants
>> a syntax like `#foo(exp)` instead of `<#foo exp>`, but still reuse the
>> expression syntax. (For me it was always part of the FM3 agenda,
>> though might will be proven to be too much...)
>> [..]
>
> During the last days I had a high-level look at different parser
> generators, and as one might imagine, there are a lot of parser
> generators, with different licenses, different maturities, different
> states of maintenance and so on.
>
> Due to https://www.apache.org/legal/resolved.html I ignored all parser
> generators which may not be included in Apache projects because of their
> license, especially GNU GPL etc.
>
> IMHO this leaves us with:
>
> * LL(k) parsers: ANTLR, JavaCC and Grammatica
> * LALR parsers: CookCC
> * PEG parsers: Mouse
> * parser combinators: jparsec, parboiled and PetitParser
>
> This list is not exhaustive, so I probably forget some interesting
> projects. If so, please share, I'd like to have a look into these, too.
>
> My idea for the next step: define a really small subset of FTL and try
> to implement PoCs for this subset with the candidates which I mentioned
> above.
>
> The subset might be something like
>
> * interpolations: ${..}
> * directives: if, assign
Just to be on the safe side, I will note that you shouldn't try to
hard-code parser logic that's specific to a directive (like "if").
Instead, you should try to parse an unified/generic directive call
syntax, and then invoke the Dialect to find out the further rules. And
that's tricky, as then the parser definition doesn't specify which
tags have an end-tag pair, and what can be nested between them, only
the Dialect knows that. Like, if you look at the current parser, it
basically says that "if" is like
"<#" "if" Expression ">" MixedContent "</#" "if" ">"
which is expressive and all, but sadly it won't be possible in FM3 to
do it like that.
> * expressions: numbers, variables, +
> * variants of the parsers with different delimiters
> * split into two parsers (interpolations/directives vs. expression language)
>
> What do you think?
I haven't used any parser library but JavaCC, so I have not tips
there. Otherwise the plan sounds good.
Anyway, I kind of repeat myself here, but the expectations that may
filter down the candidates quickly:
- Splitting into two parsers, of course
- Maintainability of custom syntax variations (like new FreeMarker
versions won't break them, or at least they need no manual work to
regenerate them)
- How parsing partially driven by the Dialect looks... it won't fit
JavaCC well for example. (But, probably it won't be very nice with
any of them.)
In case multiple of the libraries stay alive, some further extras that
can decide:
- More understandable/helpful error messages is a big plus.
- It would be interesting to see how hard it is to write a parser that
continues parsing after the first error, to catch more errors. This
is mostly for IDE-s.
> Stephan.
>
> P.S.: my more detailed list of parser generators can be found here:
> https://gist.github.com/chaquotay/8041096bad36f6f3f0d4166d6f8623b5
--
Thanks,
Daniel Dekany