Re: Anybody interested in some FM3 parser research?

Daniel Dekany Tue, 07 Aug 2018 05:59:20 -0700

Tuesday, August 7, 2018, 10:42:02 AM, Angelo zerr wrote:

> Hi Daniel,
>
> Many thanks for working on this issue.

Actually it's Stephan who works on it at the moment. Also, note that
it's for FM3, not for FM2. Though FM3 will need IDE plugins as well.

> In my case, I'm waiting for "tolerant" parser feature to continue my
> work with Freemarker Language Server

Honestly, I doubt that we will be able to reuse this next generation
FM3 parser for FM2, so I'm still saying that the old plugin can't
recover either, and for now the new plugin only needs to beat the old
one. The production parser of FM2 must be very strictly backward
compatible (must emulate all the historical glitches, according the
incompatibleImprovemetns setting). Surely the parser for the IDE need
not be that accurate, and so with quite significant work, the next
generation FM3 parser could be backported to parse FM2. However, then
in the IDE-s it can only be used in additionally to the real FM2
parser, since you want to catch all parse errors that will pop up in
production.

Anyway, for now we should focus FM3, and then we will see better what
can be backported to FM2.

> https://github.com/angelozerr/freemarker-languageserver/ which uses
> a custom tolerant parser (which basicly parses XML). If you can more manage
> the capability to update an existing Freemarker DOM by a content (ex: user
> type space, user type a FM content in the editor), it should be fantastic.

Supporting user-defined dialects (a set of user defined directives and
functions that are resolved/validated during parsing, plus maybe
custom syntax) is a main goal of FM3, if that covers what you mean.

> It will avoid to reparse the full content of the editor to rebult the
> Freemarker DOM (incremental).
>
> In other words to support IDE, we need:
>
>  * tolerant parser (required)
>  * incremental parser (optional)

Indeed, incremental parsing is a point I have missed. Though it's
surely not a requirement as far the research done by Stephan is
concerned. It will be already a miracle if we find a library that can
address all the other wishes (while it's also fast enough).

Maybe the solution will be that only the expression parser will use a
lexer/parser generator, and the top-level language parser is hand
written so we have maximum flexibility. Expressions are usually short
and are always enclosed into top-level language constructs (i.e., into
`${}` or the arguments inside "FreeMarker tags"). When expression
parsing fails, we give up the parsing of the expression, but then find
the (suspected) end of the expression in the enclosing top-level
product with some hand written code (like we find the closing "}" of
the "${" that contains the malformed expression, intelligently
skipping string literals and such), and with that we are back to a
normal parsing state (we just have an error node inside that `${}`),
so we can continue parsing. So inside expressions we stop at the first
error, we aren't incremental, we do nothing fancy, but as this
simplistically parsed region ends at the end if the expression, it
sounds acceptable to me. The tricky part is the top-level language
parser (even in FM2 actually), where we want to continue after errors,
maybe we want incremental parsing, we want to do parse-time decision
based on the runtime provided Dialect, etc., and that's why a hand
written parser could be beneficial there.

If we go down on the above path, then a requirement will be that we
must be able to run many little independent expression parsings
without much overhead. (At a quick glance JavaCC can't do that. Or at
least we had to do some awkward hacks.)

> The Java JDT ICompilationUnit of Eclipse provides this feature. It's one
> reason why Java Editor completion, etc is so fast.
>
> Regard's Angelo
>
>
>
> 2018-08-07 1:59 GMT+02:00 Daniel Dekany <[email protected]>:
>
>> Sunday, August 5, 2018, 6:58:11 PM, Stephan Müller wrote:
>>
>> > Am 04.07.2018 um 19:28 schrieb Daniel Dekany:
>> >> I wonder what parser libraries could help us, in FM3, to separate the
>> >> expression language parsing from the top-level language (like
>> >> `<#foo>`, `${...}`, etc.) parsing. Or if a hand written parsers is an
>> >> acceptable compromise. It would be good if we can change the top-level
>> >> syntax and still reuse the expression syntax. (Or, replace the
>> >> expression syntax, and reuse the top-level one.) Like, somebody wants
>> >> a syntax like `#foo(exp)` instead of `<#foo exp>`, but still reuse the
>> >> expression syntax. (For me it was always part of the FM3 agenda,
>> >> though might will be proven to be too much...)
>> >> [..]
>> >
>> > During the last days I had a high-level look at different parser
>> > generators, and as one might imagine, there are a lot of parser
>> > generators, with different licenses, different maturities, different
>> > states of maintenance and so on.
>> >
>> > Due to https://www.apache.org/legal/resolved.html I ignored all parser
>> > generators which may not be included in Apache projects because of their
>> > license, especially GNU GPL etc.
>> >
>> > IMHO this leaves us with:
>> >
>> > * LL(k) parsers: ANTLR, JavaCC and Grammatica
>> > * LALR parsers: CookCC
>> > * PEG parsers: Mouse
>> > * parser combinators: jparsec, parboiled and PetitParser
>> >
>> > This list is not exhaustive, so I probably forget some interesting
>> > projects. If so, please share, I'd like to have a look into these, too.
>> >
>> > My idea for the next step: define a really small subset of FTL and try
>> > to implement PoCs for this subset with the candidates which I mentioned
>> > above.
>> >
>> > The subset might be something like
>> >
>> > * interpolations: ${..}
>> > * directives: if, assign
>>
>> Just to be on the safe side, I will note that you shouldn't try to
>> hard-code parser logic that's specific to a directive (like "if").
>> Instead, you should try to parse an unified/generic directive call
>> syntax, and then invoke the Dialect to find out the further rules. And
>> that's tricky, as then the parser definition doesn't specify which
>> tags have an end-tag pair, and what can be nested between them, only
>> the Dialect knows that. Like, if you look at the current parser, it
>> basically says that "if" is like
>>
>>   "<#" "if" Expression ">" MixedContent "</#" "if" ">"
>>
>> which is expressive and all, but sadly it won't be possible in FM3 to
>> do it like that.
>>
>> > * expressions: numbers, variables, +
>> > * variants of the parsers with different delimiters
>> > * split into two parsers (interpolations/directives vs. expression
>> language)
>> >
>> > What do you think?
>>
>> I haven't used any parser library but JavaCC, so I have not tips
>> there. Otherwise the plan sounds good.
>>
>> Anyway, I kind of repeat myself here, but the expectations that may
>> filter down the candidates quickly:
>>
>> - Splitting into two parsers, of course
>>
>> - Maintainability of custom syntax variations (like new FreeMarker
>>   versions won't break them, or at least they need no manual work to
>>   regenerate them)
>>
>> - How parsing partially driven by the Dialect looks... it won't fit
>>   JavaCC well for example. (But, probably it won't be very nice with
>>   any of them.)
>>
>> In case multiple of the libraries stay alive, some further extras that
>> can decide:
>>
>> - More understandable/helpful error messages is a big plus.
>>
>> - It would be interesting to see how hard it is to write a parser that
>>   continues parsing after the first error, to catch more errors. This
>>   is mostly for IDE-s.
>>
>> > Stephan.
>> >
>> > P.S.: my more detailed list of parser generators can be found here:
>> > https://gist.github.com/chaquotay/8041096bad36f6f3f0d4166d6f8623b5
>>
>> --
>> Thanks,
>>  Daniel Dekany
>>
>>

-- 
Thanks,
 Daniel Dekany

Re: Anybody interested in some FM3 parser research?

Reply via email to