> Richard Lewis wrote: >> This is not an intellectual exercise or a homework assignment. The Lisp >> in question is a peculiar dialect of CLOS developed for our clients to be >> able to embed scripting behavior in our application suite. The interpreter >> has been tweaked to the point of which we can no longer squeeze out any >> more performance gains. There are 10's of thousands of lines of code >> distributed amongst many clients so changing the language in any way is >> not an option. >> >> So the next step is to have it compiled natively.
Oh boy... When I first read this, the very first thought was 'you should see what Autodesk did and take that as a [strong] hint' and the more I see appear in this thread, the stronger the feeling. For those who say 'Auto-who?!?!' think the number one vendor of CAD solutions worldwide (AFAIK). They've offered LISP 'scripting' since about the stone age and they've hit this (and other) barriers over the years. FYI: http://en.wikipedia.org/wiki/AutoLISP or google 'autoLISP' or 'Visual LISP' You mention your bottleneck is *performance*, and this is me reading between your lines, but my impression is it's such an important bottleneck that you're willing to spend the effort to change from the existing interpreter to something entirely different, e.g. a [JIT?] compiler / other scripting language even? That means the number of passes of the compiler/translator to be is not the most important issue here. Since it's SPEED what you need, let me be very clear: if you have no extensive readily available knowledge and experience with constructing optimizing compilers and/or language translators, you're at the start of a unfathomable battle. And assuming you have those abilities aplenty, then there's still the issue of 'maintenance' which splits two ways: a) when your 'translated' code will ever (if only 'just once') hit human retinae, you've just chosen to walk the path to utter damnation, because machine-translated code is /never/ maintainable in the long run; any of that sort gets rewritten as quickly as possibly, dialing up maintenance cost as it's basically smeared out dual translation efforts, once machine, once manual, where the latter is at a additional disadvantage then, so TCO goes through the roof (but smart accountants know where to hide that long term result. What's the average lifetime of a board level occupation after all) b) when your 'translated' result is strictly for machine eyes only, which means you're not translating but /compiling/ your sourcecode, you can either do it all from scratch or buy [half]product. In this case, one can argue it's a technology-only issue (budgets allowing), where your need for speed (there's that again) implies your major effort will /not/ end up creating the parser (i.e. the ANTLR work) but the optimizer part of your compiler, i.e. adding the smart AST rewriter(s), etc. In other words: ANTLR will only be the start of it, when you wish to build your own translator. (A human expert interpreting and re-implementing the 'original intent' by hand can be quite competitive here.) Alternatively, you can seek an already available [partial] solution for choice (b). A solution which you pick because it's generated output is extremely fast already, then spend your remaining budget and time on integrating and tweaking that engine. Run-time speed of generated output would win over parser generator tools used, I'd say. When your company decides to take the (b) route, I suggest you take this latter choice: acquire an existing translator/compiler engine and tweak and embed that one. When it happens to use ANTLR, more power to it and you, but ANTLR, in this particular case, is part of the easy 20%, not the backbreaking 80%. Please backpedal, and your board of directors should (re)convene on this a well, as it's /them/ who must make the /strategic/ choice where your money (& time) will be spent: 1) building a compiler/translator from scratch (and state very explicitly whether output if for humans or 'strictly machine only'; others will need to phrase this as 'output is editable (yes/no)' to be able to discover the significant consequences of such a decision. Building or purchasing is only a secondary choice here. 2) take the route Autodesk effectively did: If it's not fast enough for 'you' (for any user value of 'you'), there's an alternative user-scriptable ABI/API here for you. Autodesk offers native C++ interfaces for those folks that need near-bare metal speed. Note that in 20+ years they have switched their LISP interpreter only once, replacing the old one with a new development environment, but it has remained an interpreter all the same. The major /strategic/ choice /they/ made was adding other extensibility options: their modern C++ and .NET interfaces. 3) phase out your LISP support and substitute it with something else; while in the transition phase your company can offer translation services for those users who need those. (I wouldn't advise this right away, because LISP is a quite different animal, and once you get used to it... When Autodesk would announce a choice like this, you'd get spontaneous lynching parties. If the flame wars in their discussion boards over the years about whether to lisp or not to lisp is any indication, oy oy oy!) I like ANTLR very much, and for what it does I consider it the best tool around by a very large margin, but this way lies lunacy as well if you're not mindful of the consequences of taking this road. Context is everything, some times. Reread the messages by Mr. Kaplan twice for good measure, also reread the messages by the other posters. Then when you wish to persevere, I'd say to "My question was one of style with using Antlr... Is it better to muddy the parser grammar with a single pass or split the AST creation into 2 passes?" or rather: "Where is the best place to attach semantic information?" that the 'right' answer is one of taste. My answer is: "Spread the wealth. Do both". Elaboration: I've done both, single- and multipass, and I've even considered it a particularly enticing challenge to roll it all into a strictly single pass (and that challenge I've put to myself twice, just to prove to myself I could do it. Sigh. The male species.... The point here: single-pass sounds nice, but taking it to the extreme will make it a maintenance/readability headache. In actual practice, the engines which I still like to 'grok' today, even when I wrote them a long time ago, are the ones that use a single, fast, pass to turn text into an AST, add the semantics as far as was /easily/ doable during that phase, then update/extend those semantic attributes in subsequent 'passes', i.e. AST traversals/rewrites. Not too amazingly, those 'both passes' solutions happen to be the fastest, also when compared to my oh-so-mentally-challenging single pass compilers/interpreters. So: 'add what you can, when and where it feels right'. No 'best' there; it quickly becomes similar to the question 'which programming language is best?' -- too little context to remain truely meaningful. And here's a thought about that single- versus double-pass doodah: all compilers, even the single-pass ones, are multipass. It just depends on what you define as 'a pass'. The only really 'one pass compilers' (and the 'one pass interpreter' I once created using bison/yacc) are non-optimizing straight translators/interpreters/compilers. Since LISP and C/C++ are culturally (to borrow Mr. Kaplan's / Prof. Eco's phrasing) very far apart, I'm willing to bet you'll need multipass anyway to get anywhere near a viable translation. And then... what is an optimizer than another couple of passes -- where 'pass' == 'current code representation traversal' --> AST traversal, once you're past the lexer+scanner (ANTLR) and arrived at the 'backend'. That are multiple passes, or at /least/ one more with a bit of extra hither-and-yon, which is mandatory when you wish to rewrite (== optimize) your AST after all. The number of passes might seem to matter at first, but in the end I find it's not so important. There are more O(N) factors than just the number of passes. What I try to do myself is one fast pass lexing+scanning the human readable stuff into some form of AST (FAST as this pass will cut down multichar 'slow' access to 'fast' single ID/integer-based token access), do it fast and easy (ANTLR can be quite useful there, while I confess I tend to handcode almost all my lexers), then do the complex voodoo in the AST traversal, try to cut down on the number of traversals, or rather, number of nodes visited -- and when you can, cut down on heap dynamics (malloc/free/new/delete), both the explicit and implicit ones, as those can eat an inordinate amount of time when you're not careful. And that's enough for now. It's high time for some shuteye here. -- Met vriendelijke groeten / Best regards, Ger Hobbelt -------------------------------------------------- web: http://www.hobbelt.com/ http://www.hebbut.net/ mail: [email protected] mobile: +31-6-11 120 978 -------------------------------------------------- List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en -~----------~----~----~----~------~----~------~--~---
