Re: [antlr-dev] ANTLRv4?

Kay Röpke Mon, 18 Jan 2010 01:27:26 -0800

Hi!

my couple of cents below.


On Jan 17, 2010, at 10:48 PM, Terence Parr wrote:

Hiya. I'm now ready to embark on ANTLR (and ANTLRWorks) code development after 2 years in book-writing mode. We've got an important decision to make: do I try to continue with the current ANTLR v3 source base or start from scratch heading to ANTLR v4? Here are some data points to consider:
* I must reimplement ANTLR v3 in v3, just like I did recently for ST (yielding ST v4). Besides being untidy, important projects like eclipse cannot include ANTLR at the moment due to license restrictions on it's v2 dependency :( Jim Idle has graciously build a proper ANTLR parser, AST builder, and AST tree grammar that I can use to get us to v3. :)

Building ANTLR-centric tools is a pain with the v2 grammars, let me tell you. So I regard the transition to v3 in v3 (or is that v4 in v4) as the most critical point, aside from the weird v2 language. BTW it is the same for ST3 being based on v2, which makes it really icky to build tools on top of it. What I've come to realize is that development tools make all the difference, both in producing and maintaining code. Essentially, if we want better adoption, we must provide an easy path for tools devs.

* While I tried to do as much re-factoring as possible while developing ANTLR v3, most of it was tactical. At some point, strategic re-factoring (rewriting whole sections or all) becomes necessary. For example, I literally had to jam grammar composition into the tool, leaving it fragile. It's becoming hard to fix things and add new features. A lot of the current features have been added while writing the first and second books. Doing so simultaneously was valuable from a feature and functionality point of view, but not from a code cleanliness point of view.

Every time I look at the tool code I cringe, to be honest. It's huge (for what it does) and at times I can't help thinking that it should be way more modular to make it easier to integrate (I sometimes have the need to generate ANTLR grammars and then compile/classload them into host code at runtime, talk about meta…). Honestly, the tool is something most users don't care about at all, so I see no problems with rewriting it from scratch (but of course one can reuse huge chunks of code as you pointed out).

* I have some important new features such as the better expression grammar stuff that would be inconvenient to implement in the current code base.


And incremental parsing ;)
But yes, I agree.

* Trying to jam a new ANTLR front end into the existing semantics engine and grammar analysis engine might be challenging. It would be hard to get all of the various pieces to hook up properly.
* Code generation. I learned a lot while building ANTLR v3's re- targetable code generation system; as a result, it's not exactly the cleanest thing in the world. I also would like to restructure the generated parsers. E.g., I'm going back to exception handling for backtracking since it is much cleaner and likely faster, as long as I reuse the same exception object during backtracking. targets without exception handling like C can continue to use the existing "if (failed) return" concept. I'd also like to manage my own arguments and return values stack so that rules predicated upon parameters don't fail during code generation (when I move a parameter reference outside of the defining function). Even if we stay with the exact same code generator and templates, all of you target developers will have some changes to make to keep in sync regardless.

+1

I'm still feeling the wish for an target-agnostic optimizer stage before code generation (although at times it would likely need to integrate with codegen). Having separate param/return stacks would make that easier, I guess. Kinda worried about efficiency and debugger support, though, but those could potentially be countered by enhance ANTLRWorks integration.

* The runtime library seems to be pretty good. I don't think I would change much of anything there except for efficiency stuff. With luck, a v4 or updated v3 version could also reuse most of the templates. I'd tried to avoid gratuitous changes. That said, I don't think that templates are the problem. It's always the library that people have to build in order to make a target and those are already done. perhaps target developers can voice their opinions here.

I agree that targets are mostly about the runtime libs, the templates are mostly just writing code against your own API at that point. Although most target development probably does its dev in lockstep (add this little ANTLR feature in templates, figure out the runtime API for it, tweak API, tweak templates etc). I don't think this is a problem, though.

* I have 8 months left on my sabbatical for full-time code development (i.e., no teaching duties). Sabbaticals come once every seven years. [Oh, feel free to start hating me now!] I suspect that I have about another year and a half or two years before I need to start writing again. First thing would probably be a v4 update for the definitive ANTLR reference guide.


I'm jealous now :P

* STv4 was remarkably easy and fast to build using the existing unit tests and reference implementation (ST v3). There was a lot of cutting and pasting, but most importantly, there was no decision- making really. I knew exactly what I wanted to build. Deciding on features and functionality is always the hard part. coding is easy. I estimate it would take me six months to make a v4 ANTLR. I'd shoot for simply a better implementation of the current functionality set and then worry about the new features. The NFA->DFA conversion algorithm is the most difficult component, but one that I could reuse almost completely.

The internal machine should not have to change significantly, agreed. Mostly internal API usage changes, I assume. Mostly a non-issue.

So, let's open this up for discussion. What are the pros and cons of taking the six months to build ANTLR v4? What effect would it have on the project? are there some things that need to be fixed immediately? Would such changes get thrown out when I rebuild the front end (which we *must* do to remove the v2 dependency)? Are there changes I could make that would help other tool developers like Gerald's eclipse plug-in? I'd like to make it as easy as possible for people to integrate ANTLR stuff.


The cons I can see:

- increased context switching between v3 and v4 dev. Bugs may need fixing in v3 while the attention is on v4. Occupational hazard. - risk of losing the odd target in the beginning, due to incompatible changes which invariably will arise. Just a problem with timing. - might need more time to stabilize in the beginning, new bugs. Always happens with rewrites, but might also shake out exisiting bugs.


The pros:

- Better chance of integration, due to clear license and easier to re- use Tool code. That's the big thing.

- Can we please use loggers in Tool and related things? :)

- Could open the possibilty to extend ANTLR without forking it. Meaning, I would like to see "extension points" a la Eclipse that let me tweak code gen, for example (think instrumentation or optimizer modules).

- Change to make it OSGi-friendly (the tool I mean).
- more I can think of after more coffee :)

ok, who's freaked out now? ;)


/me. in general :)

cheers,
-k

--
Kay Röpke

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org/mailman/listinfo/antlr-dev

Re: [antlr-dev] ANTLRv4?

Reply via email to