Philosophically, I agree with you Gabe. A clean "squashing" solution in O3
would be ideal. Defining "clean" for O3 is probably a subjective process in
itself. And yes, whenever I do run into problems with the CPU models it's
usually some weird squashing condition (branch mispredicts, syscalls in SE
mode, interrupts in FS, etc.)

With that said, I'd never straightforward say "commit a hack". But if the
code in the first place is confusing and someone does a hack to get a
problem to work, then that gives us some confidence that a cleaner solution
implementing the same thing will work and also provides a reference point
for that cleaner solution.

The fact that the gem5 has so many developers "guards" the repo in my
opinion, so  the diligence we have with code review keeps gem5 maintainable
(Steve's "necessarily complex" quote probably applies here somewhere in
this email!).

So in this case, since Nilay posted the fix to the reviewboard, we (the
gem5 developers) can try to make a decision about the tradeoffs and figure
out whether it's worth it to replicate the change in a "clean" way or push
it to the repo as is.

Well, that's my view anyway...Now off my soapbox!

-Korey

On Mon, Jan 2, 2012 at 1:36 AM, Gabe Black <[email protected]> wrote:

> Yes, many of the things I do for gem5 can be very frustrating.
>
> A hack that works is not necessarily any better than code that doesn't
> work because it will have to be maintained and worked with/around for a
> long time. It can cause more damage than what it's supposed to be
> fixing, and probably will.
>
> Gabe
>
> On 12/30/11 04:03, Korey Sewell wrote:
> > I can't vouch for reading all the emails but I have gone through this
> whole
> > thread (which dates back to Nov. 29th).
> >
> > Also, I'm not all the way familiar with x86 so maybe this excludes me
> from
> > understanding the problem at the detailed level, but I think I am
> starting
> > to get a good grasp of the general squashing problem here (basically
> > maintaining squash state through exception events).
> >
> > My concern is that if you don't literally "fix" the problem first, you
> can
> > get caught up in the minutia of making this big grand sweeping change and
> > then have no good way to say if "the fix" fixes anything in the first
> place.
> >
> > If Nilay or anyone could get something to the reviewboard that worked,
> hack
> > or not, then that would be a good step toward making the "clean" change
> > that I think you're referring to Gabe. We dont have to commit the code,
> but
> > on a 1st pass working is better then "not working", right? :)
> >
> > (Gabe, I do understand it can be frustrating explaining the same things
> > over/over again.)
> >
> > On Fri, Dec 30, 2011 at 3:48 AM, Gabe Black <[email protected]>
> wrote:
> >
> >> If you read my emails the problem would already be identified and
> >> understood, because I did that weeks or even months ago and explained it
> >> multiple times. A hack fix is not ok. A hack fix is why this is still
> >> broken in the first place. That's also something I explained in my
> emails.
> >>
> >> Gabe
> >>
> >> On 12/30/11 02:50, Korey Sewell wrote:
> >>> I agree with you Gabe that the squashing mechanism could be cleaned up.
> >>>
> >>> But I'd also suggest that Nilay should understand/identify the problem
> >>> first and then implement a first-pass fix without a big squashing
> revamp
> >>> (if possible).
> >>>
> >>> That way, when we (nilay, you, me, whoever) gets to revamping the
> squash
> >>> code, there is at least a set test case and trace we can use to debug
> >> with..
> >>> On Fri, Dec 30, 2011 at 2:30 AM, Gabe Black <[email protected]>
> >> wrote:
> >>>> On 12/05/11 05:24, Gabe Black wrote:
> >>>>> On 12/03/11 13:02, Nilay Vaish wrote:
> >>>>>> On Wed, 30 Nov 2011, Gabriel Michael Black wrote:
> >>>>>>
> >>>>>>> That may be the same thing that's happening with Ali's branch
> >>>>>>> predictor patch. With Ruby execution changes enough to hit one of
> the
> >>>>>>> broken squashing cases. The Ruby integration is probably working.
> >>>>>>>
> >>>>>>> Gabe
> >>>>>>>
> >>>>>>> Quoting Nilay Vaish <[email protected]>:
> >>>>>>>
> >>>>>>>> Gabe, when I boot FS with O3 CPU and Ruby, I get the following
> >>>>>>>> output on the terminal of the simulated system.
> >>>>>>>>
> >>>>>>>> EXT2-fs warning: mounting unchecked fs, running e2fsck is
> >> recommended
> >>>>>>>> VFS: Mounted root (ext2 filesystem).
> >>>>>>>> Freeing unused kernel memory: 232k freed
> >>>>>>>> init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
> >>>>>>>> 00007fff38fa81b8 error 15
> >>>>>>>> init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
> >>>>>>>> 00007fff38fa81b8 error 15
> >>>>>>>>
> >>>>>>>> The segfault message keeps appearing. Do you know why this might
> be
> >>>>>>>> happening?
> >>>>>>>>
> >>>>>> Gabe, how can I confirm this? Is there something that I can do to
> >>>>>> resolve the problem with branch prediction?
> >>>>>>
> >>>>>> Thanks
> >>>>>> Nilay
> >>>>>> _______________________________________________
> >>>>>> gem5-dev mailing list
> >>>>>> [email protected]
> >>>>>> http://m5sim.org/mailman/listinfo/gem5-dev
> >>>>> I'm fairly confident that's what's going on. The stack address is
> user
> >>>>> space and the instruction pointer is in kernel space. The page fault
> is
> >>>>> from near the ip, and the error code is 15 which means, if I'm not
> >>>>> mistaken, a permission problem on fetch. You can't easily fix the
> >>>>> problem, but if you want to get started the first step would be to
> >> clean
> >>>>> up the squashing mechanisms in O3 like I brought up in that email a
> >>>>> while ago. The real problem is that squashing doesn't always preserve
> >>>>> enough state (the macroop instance specifically) in all situations,
> and
> >>>>> that the squashing stuff is too ad-hoc and all over the place to
> really
> >>>>> fix it correctly and know that it's correct. I'd thought I fixed it
> >>>>> before when I fixed one particular squash path, but obviously I
> didn't
> >>>>> get it all.
> >>>>>
> >>>>> Gabe
> >>>>> _______________________________________________
> >>>>> gem5-dev mailing list
> >>>>> [email protected]
> >>>>> http://m5sim.org/mailman/listinfo/gem5-dev
> >>>> What was unclear about this email and the ones before it? Did you not
> >>>> believe me for some reason? You've spent about a month partially
> >>>> rediscovering what I explained in them. I've already said how this
> needs
> >>>> to be fixed.
> >>>>
> >>>> Gabe
> >>>> _______________________________________________
> >>>> gem5-dev mailing list
> >>>> [email protected]
> >>>> http://m5sim.org/mailman/listinfo/gem5-dev
> >>>>
> >>>
> >> _______________________________________________
> >> gem5-dev mailing list
> >> [email protected]
> >> http://m5sim.org/mailman/listinfo/gem5-dev
> >>
> >
> >
>
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
>



-- 
- Korey
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to