Quoting Ali Saidi <sa...@umich.edu>:
Hi Gabe,
On Jul 12, 2011, at 6:27 AM, Gabe Black wrote:
Hey folks, I sort of have an X86_FS O3 regression working! Sort of yay!
The problem is that this recent change:
O3: Fix up pipelining icache accesses in fetch stage to function properly
seems to break it and make it hang indefinitely. It was recent enough
that I had everything working, pulled this in, and then mysteriously it
started hanging. It should be mentioned that this code didn't get sent
out for review in this form I don't think, although I doubt I would have
jumped at the opportunity and I remember an earlier version being sent out.
The code did get sent out for review and reviewed about three weeks ago.
http://reviews.m5sim.org/r/746/
Ah, ok, you're right. I didn't see it in my email (yesterday) so I
assumed it never went up.
It's late, and it's not immediately clear to me if there's something
wrong with this change, or if it subtly modifies what one of my own
patches do. It's also possible that this modifies some aspect of timing
that exposes a bug in X86.
One potential problem there might be, though, is that pipelined ifetches
might implicitly act like a cache which isn't necessarily maintained
like one. Maybe fetch is getting stale data which is why it gets stuck?
Or my complaint about interrupts not getting through might have happened
after all?
The pipelined fetches might act like a cache, but not much more so
that the fetch state itself (that keeps an entire cache line
around), or the rest of the pipeline that doesn't notice after an
instruction is fetched if it's written to. If self modifying code is
a problem (I know it's allowed on x86, I don't know how used it is)
then it's simply luck at this point. It could be an interrupt issue
or it could be a something else.
I definitely wouldn't call it common, but I'm sure there are examples
where it could cause an issue.
I'd really like to get my regression checked in so I'd really like to
assume this change is to blame and to back it out. I really don't want
to have to pick at it and figure out what's going on. At the same time,
I'd be making a major assumption about my own patches and I'd push the
annoyance of finding the problem onto the other guy. It would be great
there was a magical way to determine who's fault it was ahead of time
and to just make whoever that was take care of it. What's a good way to
handle this?
We have run over 5000 hours of O3 simulation in the last few weeks
with this the fetch pipelining patch and haven't discovered any
issues with the patch, so I'm inclined to believe that the patch is
fine. My guess would be that there are still some subtle bugs in
x86/o3. The fetch patch does make the pipeline more aggressive, so
there will be fewer bubbles and that could have exposed some issue
especially when combined with the branch predictor patch. Between
the two of them performance improves substantially. I don't think
there is a magic bullet good way, unfortunately,
Yeah, I think you're right. It's probably an x86 bug so I guess I'll
have to figure out what's going on. I was hoping things were mostly
taken care of since it seemed to work, but I guess not.
Gabe
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev