That looks pretty good to me.  I think if you want/need to be more
sophisticated, we'd need to jump from regular expressions to a tokenized
representation and apply some kind of mini-grammar (e.g., to handle nested
[] pairs), which seems like a leap that would require some concrete
justification.

Steve

On Tuesday, September 27, 2011, Gabe Black wrote:

> Hmm, how about this? The operand can be followed by an arbitrary number
> of items from the following list, in any order.
>
> 1. Alpha numeric characters.
> 2. A dot.
> 3. A [] pair, where everything from the [ to the ] (excluding more [s or
> ]s) is included. C++ syntax makes this pretty safe, I think.
>
> This picks up the array index case without too much fuss, although it's
> still pretty limited. It can handle multidimensional arrays, nested
> structure members, etc., but it can't handle members who are selected
> with function like syntax, not likely to be a problem, or arrays whose
> index is selected with from another array, ie. arr[x[y]], also not
> likely to be a problem. This seems like it handles the important case
> while not handling more esoteric cases, and still stays pretty simple,
> more or less. I like it. What do you think?
>
> Gabe
>
> On 09/27/11 01:58, Gabe Black wrote:
> > I played with this a bit, and it turns out this is pretty tricky. If you
> > allow ".", whitespace and alphanumeric characters, things generally
> > work. If you wanted to do something with SIMD, though, and for instance
> > use an array index in a loop like so:
> >
> > for (int i = 0; i < 8; i++)
> >     Ra.bytes[i] = Rb.bytes[i] + Rc.bytes[i];
> >
> > it would incorrectly decide that Ra was a source because it would see
> > the [ and stop looking for a =.
> >
> > On the other hand, if you let it match anything except a comma,
> > semicolon or = on the way to a =, then in a case like this:
> >
> > if (Foo)
> >     Ra = Rb + Rc;
> > else
> >     Ra = Rb - Rc;
> >
> > It would incorrectly decide that Foo was a dest because there was no
> > comma or semicolon between it and the equals on the next line.
> >
> > Given that the first approach gets more things right and enables a major
> > use case (bitfields in control registers) I'm inclined to go with it,
> > but not being able to use it with SIMD, a second major use case, is a
> > serious drawback.
> >
> > Gabe
> >
> > On 09/24/11 18:07, Gabe Black wrote:
> >> Once the "_" vs "." change is tested by Ali and checked in, the next
> >> step to get generic operand types working is to address a deficiency in
> >> how is_src and is_dest is determined by the parser. Right now it uses
> >> this regular expression to determine if something is a dest, and if it's
> >> not being used as a dest it's a src. This is for a particular instance,
> >> so one operand can be both if it's used more than once.
> >>
> >> assignRE = re.compile(r'\s*=(?!=)', re.MULTILINE)
> >>
> >> Basically what that does is it ensures that is described by this
> comment:
> >> # if the token following the operand is an assignment, this is
> >> # a destination (LHS), else it's a source (RHS)
> >>
> >> That's worked quite well, especially considering how simple it is, and
> >> I'd like to preserve both its accuracy and its simplicity. The problem,
> >> however, is that the operand name won't necessarily be the last thing
> >> before the equals if an operand is being used as a dest. As a simple
> >> example, if I wanted to set the foo field of the Bar operand, it might
> >> look like this:
> >>
> >> Bar.foo = 42;
> >>
> >> Here, I believe the regular expression above would determine that Bar
> >> was a source because .foo appeared immediately after it.
> >>
> >> There are two possible solutions I see for this so far. First, we could
> >> make the regular expression ignore "."s and identifiers in addition to
> >> whitespace on its way to the equals sign. Second, we could make it
> >> ignore *everything* on its way to the equals sign, except a "," or a ";"
> >> which would, roughly, denote the end of the expression.
> >>
> >> Neither of these approaches seem like they'll be fool proof, so I was
> >> wondering what you guys think? Can you think of any naturally occurring
> >> bit of code that would make one or the other get the wrong answer? The
> >> original wasn't fool proof either, but in practice it worked really
> >> well. I'd like to go for the same thing, so it's ok for it to be wrong
> >> sometimes. I just don't want there to be something common case where it
> >> messes up.
> >>
> >>
> >>
> >> Also, there's still the issue of how partial writes, like my example
> >> actually, are handled as far as being a dest or a src or both. In the
> >> example above, Bar is actually both a source and a destination because
> >> the non-foo bits are set to their old values. That changes, though, if
> >> those other bits are necessarily set later, or if Bar is one of the new
> >> types of MiscReg Ali added that accumulate partial writes over time,
> >> useful for fault bits in floating point. If the Bar.foo = 42 case makes
> >> Bar a dest, maybe all we need to do is add "Bar = Bar" where it needs to
> >> be both and leave the parser alone. This puts the burden on the ISA
> >> description, although it may need to be there anyway, and it feels a
> >> little hacky. I want to deal with this after the issue above, but I
> >> wanted to mention it since it'll becoming up next probably.
> >>
> >> Gabe
> >> ______________________________
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to