Quoting Steve Reinhardt <[email protected]>: > (Is there a reason we're not having this conversation on m5-dev? I > moved it back there for posterity, and in case anyone else wants to > chime in. Note that we're discussing better processing of the C++ > code snippets in the ISA description language.) > > On Tue, Aug 25, 2009 at 1:08 AM, Gabe Black<[email protected]> wrote: >> nathan binkert wrote: >>>> One other possible answer is to try and build a reasonable approximate >>>> C++ parser that's good enough for the snippets that we use. That >>>> would be non-trivial, but since (1) we don't have to generate code, so >>>> we only have to be more accurate than the current regex system and (2) >>>> it's not the end of the world if there are tricky things the parser >>>> can't handle and we're forced to rewrite a few code snippets, I don't >>>> see it as an impossible task. I think there are multiple >>>> public-domain C++ grammars we could start with too. >>>> >>> >>> I have a couple of comments here. First, C++ can't be parsed with >>> ply. I'm not sure which parts of the language are the problem, but >>> the language is ambiguous and not context free. That said, what >>> exactly are the problems with what we have? I can try to see if I can >>> improve things (or teach gabe enough about ply so he can do it.) >>> >>> Nate >>> >> >> I actually try to avoid the problem areas so I can't list them >> exhaustively, but basically the way things work follows this basic rule >> (I think). If the name of the operand appears in the text of the code >> with or without an optional type modifier, it's an operand. If it's in >> front of an equal sign, it's a destination, if not, it's a source. Even >> though that's pretty simple it works remarkably well. Unfortunately it's >> confused by things like pass by reference function arguments, using it >> as a temporary without actually meaning to access it's original value >> (ie. reading it to compute flag bits), setting it conditionally, and >> maybe a few other things. It would be really hard to get those things >> right without understanding the syntax of C++, and even then, without >> knowing how functions are defined, etc., perfectly parsing the C++ won't >> give you all the information you might need. That's what makes making >> g++ figure it out attractive since it necessarily figures out all those >> things at some point. The hard/impossible part is tricking it into using >> that information to set up the operand index arrays in the static inst, >> set up the reading and writing code, etc. I think templates kind of, >> sort of might do the trick, but I just don't think you can get it to >> automatically fill in the members of a class at construction time based >> on the code in its member functions. > > Yes, doing a full parse is impossible for a number of reasons, not > just the fact that C++ is context sensitive, but that in the case of > the code snippets you don't even have all the context (and I think > trying to generate the full context as Gabe is suggesting is probably > impractical, as that would require sucking in lots of header files for > each snippet and only lengthen compile times even further). That's > why I said "approximate".
I was thinking that might happen as part of the C++ compile phase, independent of the ISA parser. It would be pretty hard or maybe even impossible to get C++ to manage things for us, though. > > My (half-baked) thought was to build a parser that at least understood > the basics of C++ expression syntax and could parse the snippets by > making some charitable assumptions about what was a type and what was > not (or perhaps we could require the use of typename declarations... > I'd hope not too much, but it could be a fallback for resolving > ambiguities). Note that we already effectively restrict these > snippets to a subset of C++ to avoid confusing the regexes, so I'm > sure whatever we do would enable a larger subset than what's currently > supported. > > I think this would solve most of Gabe's issues, since it could tell > when the only read of an operand occurs after a write, not get > confused by operand mentions in comments, robustly distinguish RHS > from LHS of assignments, etc. More importantly, it would solve the > biggest problem with the status quo, which is that right now there's > no indication that the regex scan is getting confused because you've > strayed out of the supported subset and encountered any of Gabe's > issues; you have to look at the instruction object definition and > notice that the operand list is not what you expected. A key > potential capability of a real parser would be for it to robustly > determine when it can't figure out what's going on, so at least we > could avoid these silent errors. That would be great. > > Note that some of Gabe's issues aren't related to the parser and are > more fundamental. In particular: > - It's not clear what to do about conditional updates. They can't > really be handled properly in hardware the face of register renaming, > so my inclination is that if the parser could recognize situations > where an update only occurs on one branch of an if statement then it > should flag the snippet as an error. I'm not sure what Gabe has in > mind. There's no support in any of our models for indicating a > conditional output anyway. That seems like a reasonable thing to do. > - Pass by reference operands should also just be flagged as errors, > since there's no way to know if the operand is read, written, or both. It's more difficult than that, though, since it's not possible as far as I know to tell when an operand is passed as reference without a prototype. Gabe _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
