Nice, philosophical post, Tim. I agree with Geertjan to publish it at https://blogs.apache.org/netbeans/ - that's the kind of overview that shouldn't be lost in the email conversation. -jt
po 5. 8. 2019 v 20:24 odesílatel Tim Boudreau <niftin...@gmail.com> napsal: > > > > I was just curious about the theoretical aspect of parsing. Isn't there a > > unified parsing API, using ANTLR/lex/yacc which can parse any language > > given a grammar for it? Why do we use a different parsing implementation > > (like graal js parser in this instance) when a unified approach will help > > us support lots of languages easily? > > > > First, in an IDE, you are *never *just "parsing". You are doing *a lot* > with the results of the parse. An IDE doesn't have to just parse one > file; it must also understand the context of the project that file lives > in; how it relates to other files and those files interdependencies; > multiple versions of languages; and the fact that the results of a parse > do not map cleanly to a bunch of stuff an IDE would show you that would be > useful. For example, say the caret is in a java method, and you want to > find all other methods that call the one you're in and show the user a list > of them. The amount of work that has to happen to answer that question is > very, very large. To do that quickly enough to be useful, you need to do > it ahead of time and have a bunch of indexing and caching software behind > the scenes (all of which must be adapted to whatever the parser provides) > so you can look it up when you need it. In short, a parser is kind of like > a toilet seat by itself. You don't want to use it without a whole lot of > plumbing attached to it. > > Second, while there are tools like ANTLR (version 4 of which is awesome, by > the way), there is still a lot of code you have to write to interact with > the results of a parse to do something useful beyond syntax coloring in an > IDE. One of my side projects is tooling for NetBeans that *do* let you > take an ANTLR grammar and auto generate a lot of the features a language > plugin should have. Even with that almost completely declarative, you wind > up needing a lot of code. One of the languages I'm testing it with is a > simple language called YASL which lets you define javascript-like schemas > with validation constraints (e.g., this field is a string, but it must be > at least 7 characters and match this pattern; this is an integer number > but it must be > 1 and less than 1000 - that sort of thing). All the > parsing goodness in the world won't write hints that notice that, say, the > maximum is less than the minimum in an integer constraint and offer to swap > them. Someone has to write that by hand. > > Third, in an IDE with a 20 year history, a lot of parser generating > technologies have come and gone - javacc, javacup, ANTLR, and good old > hand-written lexers and parsers. Unifying them all would be an enormous > amount of work, would break a lot of code that works just fine, and the end > result would be - stuff we've already got, that already works, just with > one-parser-generator-to-rule-them-all underneath. Other than prettiness, I > don't know what problem that solves. > > So, all of this is to say: We use different parsing implementations > because parsing is just a tiny piece of supporting a language, so it > wouldn't make the hard parts easier enough to be worth it. And there will > be new cool parser-generating technologies that come along, and it's good > to be able to use them, rather than be married to > one-parser-generator-to-rule-them-all and have this conversation again, > when they come along. > > -Tim >