Hello! A little backstory (this will be a long message):
I've been working on an editor for my favorite language, CFML, on and off for years now. The editor is based on Eclipse. The way we parse code is this hand-made kludge job. I want to take things to a higher level, so I thought, "hey, this ANTLR thing looks pretty nifty!". That was like a couple years ago. I've messed with writing a grammar, on and off, and haven't got far. Someone recently said they'd buy me The Book, which will probably help, but what I have to do seems pretty hard, and just lurking on this list for a bit makes me feel even more out of my depth, which isn't "teh awesome", to to speak. I'm not really too concerned with being out of my depth, as the kinds of deep water sharks/tentacled beasts I fear don't live on the internet, but I do wonder about the best way to achieve my goal. Forgive my probably lame questions: An example of the source code that I have to parse (it's a markup/scripting language, mixed with HTML sometimes, similar to PHP): <html> <cffunction name="test"> <cfargument name="fred" test="test"/> <cfscript> WriteOutput("FREDFREDFRED"); somethinghere = 343; </cfscript> <cfif thisisatest is 1> <cfoutput>#fred#</cfoutput> </cfif> </cffunction> <cfscript> todaysDate = now(); function doSomething(String doWhat) { var done = arguments.doWhat & " later"; return done; } function returnSomething(theThing) { return theThing; } </cfscript> <cfset fred = 2/> <cfset bob = doSomething("build a parser") /> <cfset test(fred)/> <cffunction name="test" > <cfset var woo="hoo" /> <cfargument name="test" default="#WriteOutput("">"")#"/> <!--- I think this is valid! ---> </cffunction> <body> <cf_myCustomTag action="rock"> <cfoutput> This is a <b>test</b> #fred# </cfoutput> <table> <tr> <td style="<cfoutput>#somethinghere#</cfoutput>">asdfasdf</td> <td style="fred"></td> </td> </table> </body> </html> That's some of the nastiest bastard data as an example. Generally it's far better than that. I wrote something that uses the Jericho HTML lib to parse the tags, and that works well enough, I guess. When I hit a <cfscript> tag I hand it off to another (broken) parser. The cfscript stuff is ECMAScript-ish, so I think I can modify an existing grammar and get the broken parser going (I don't have as much trouble modifying stuff as creating it), but how would you guys go about handling parsing something like this? Should I try to write an overall ANTLR grammar for everything, maybe with a sub-grammar-type-deal for the script stuff? Or just say screw it, and stick to using ANTLR for just the ECMAScript-like portion? It gets a lot more complicated than the above code example, too, even for just the script stuff. There are a few CFML engines, and some care about semi colons and some don't (which I've seen handled elsewhere, so not too worried about), and some can do different "for" loops, etc. (more worried about things like this). They change by version, as well, and I'd like to support different versions in a perfect world. I have to be honest-- I didn't know anything about ASTs and Lexing and Parsing a few years ago. Maybe in some abstract form, but not like I do now (a lot more, relatively). And I *still* don't think I've totally (or even "very much") grokked it, or I wouldn't be asking these questions. I'm wondering if I'm insane for thinking about using ANTLR for the "whole shebang". In the few years that I've been watching antlr, lots of nifty stuff has been added, which makes me think that maybe it's not as crazy an idea as it seemed at one time, at least. But it's probably too much to bite off at once, even if it's not a crazy idea, neh? Maybe I should stick to futsing with one of the existing EMCA grammars for the script-like portions, and try to wrap my head around antlr and parsing in general more first? Start from scratch and actually learn this stuff? I'll probably be the one working on the grammar in the future, so tho I'm tempted to try to get someone to donate time/money==grammar, I want to learn. But I don't have another few years to produce, so what's the practical approach, given this long and not-very-well-expressed background? Apologies for framing my questions as poorly as I fear I have. =) :Den -- If all mankind minus one were of one opinion, mankind would be no more justified in silencing that one person than he, if he had the power, would be justified in silencing mankind. John Stuart Mill List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.