That looks pretty good to me. I think if you want/need to be more sophisticated, we'd need to jump from regular expressions to a tokenized representation and apply some kind of mini-grammar (e.g., to handle nested [] pairs), which seems like a leap that would require some concrete justification.
Steve On Tuesday, September 27, 2011, Gabe Black wrote: > Hmm, how about this? The operand can be followed by an arbitrary number > of items from the following list, in any order. > > 1. Alpha numeric characters. > 2. A dot. > 3. A [] pair, where everything from the [ to the ] (excluding more [s or > ]s) is included. C++ syntax makes this pretty safe, I think. > > This picks up the array index case without too much fuss, although it's > still pretty limited. It can handle multidimensional arrays, nested > structure members, etc., but it can't handle members who are selected > with function like syntax, not likely to be a problem, or arrays whose > index is selected with from another array, ie. arr[x[y]], also not > likely to be a problem. This seems like it handles the important case > while not handling more esoteric cases, and still stays pretty simple, > more or less. I like it. What do you think? > > Gabe > > On 09/27/11 01:58, Gabe Black wrote: > > I played with this a bit, and it turns out this is pretty tricky. If you > > allow ".", whitespace and alphanumeric characters, things generally > > work. If you wanted to do something with SIMD, though, and for instance > > use an array index in a loop like so: > > > > for (int i = 0; i < 8; i++) > > Ra.bytes[i] = Rb.bytes[i] + Rc.bytes[i]; > > > > it would incorrectly decide that Ra was a source because it would see > > the [ and stop looking for a =. > > > > On the other hand, if you let it match anything except a comma, > > semicolon or = on the way to a =, then in a case like this: > > > > if (Foo) > > Ra = Rb + Rc; > > else > > Ra = Rb - Rc; > > > > It would incorrectly decide that Foo was a dest because there was no > > comma or semicolon between it and the equals on the next line. > > > > Given that the first approach gets more things right and enables a major > > use case (bitfields in control registers) I'm inclined to go with it, > > but not being able to use it with SIMD, a second major use case, is a > > serious drawback. > > > > Gabe > > > > On 09/24/11 18:07, Gabe Black wrote: > >> Once the "_" vs "." change is tested by Ali and checked in, the next > >> step to get generic operand types working is to address a deficiency in > >> how is_src and is_dest is determined by the parser. Right now it uses > >> this regular expression to determine if something is a dest, and if it's > >> not being used as a dest it's a src. This is for a particular instance, > >> so one operand can be both if it's used more than once. > >> > >> assignRE = re.compile(r'\s*=(?!=)', re.MULTILINE) > >> > >> Basically what that does is it ensures that is described by this > comment: > >> # if the token following the operand is an assignment, this is > >> # a destination (LHS), else it's a source (RHS) > >> > >> That's worked quite well, especially considering how simple it is, and > >> I'd like to preserve both its accuracy and its simplicity. The problem, > >> however, is that the operand name won't necessarily be the last thing > >> before the equals if an operand is being used as a dest. As a simple > >> example, if I wanted to set the foo field of the Bar operand, it might > >> look like this: > >> > >> Bar.foo = 42; > >> > >> Here, I believe the regular expression above would determine that Bar > >> was a source because .foo appeared immediately after it. > >> > >> There are two possible solutions I see for this so far. First, we could > >> make the regular expression ignore "."s and identifiers in addition to > >> whitespace on its way to the equals sign. Second, we could make it > >> ignore *everything* on its way to the equals sign, except a "," or a ";" > >> which would, roughly, denote the end of the expression. > >> > >> Neither of these approaches seem like they'll be fool proof, so I was > >> wondering what you guys think? Can you think of any naturally occurring > >> bit of code that would make one or the other get the wrong answer? The > >> original wasn't fool proof either, but in practice it worked really > >> well. I'd like to go for the same thing, so it's ok for it to be wrong > >> sometimes. I just don't want there to be something common case where it > >> messes up. > >> > >> > >> > >> Also, there's still the issue of how partial writes, like my example > >> actually, are handled as far as being a dest or a src or both. In the > >> example above, Bar is actually both a source and a destination because > >> the non-foo bits are set to their old values. That changes, though, if > >> those other bits are necessarily set later, or if Bar is one of the new > >> types of MiscReg Ali added that accumulate partial writes over time, > >> useful for fault bits in floating point. If the Bar.foo = 42 case makes > >> Bar a dest, maybe all we need to do is add "Bar = Bar" where it needs to > >> be both and leave the parser alone. This puts the burden on the ISA > >> description, although it may need to be there anyway, and it feels a > >> little hacky. I want to deal with this after the issue above, but I > >> wanted to mention it since it'll becoming up next probably. > >> > >> Gabe > >> ______________________________ _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
