Re: Seeking OK to commit KSE MIII-again

Jake Burkholder Fri, 31 May 2002 09:58:11 -0700

Apparently, On Thu, May 30, 2002 at 06:56:30PM -0700,
        Julian Elischer said words to the effect of;


> > > + /* Note: use of M_WAITOK means it won't fail. */
> > > + newkse->ke_pcb =
> > > +     &(((struct md_store *)(newkse->ke_mdstorage))->mds_pcb);
> > > + newkse->ke_frame =
> > > +     &(((struct md_store *)(newkse->ke_mdstorage))->mds_frame);
> > > +
> > > + /* Copy the upcall pcb. Kernel mode & fp regs are here. */
> > > + bcopy(td->td_pcb, newkse->ke_pcb, sizeof(struct pcb));
> > > +
> > > + /* This copies most of the user mode register values. */
> > > + bcopy(td->td_frame, newkse->ke_frame, sizeof(struct trapframe));
> > > +}
> > 
> > ke_frame, ke_pcb and ke_mdstorage should all be in a machine dependent
> > struct mdkse, like mdproc.  The fact that the storage is large enough
> > to warrant using malloc is machine dependent, so it should not be a
> > pointer.  I would be inclined to just embed a trapframe.
> 
> errrr... ke_mdstorage is just a pointer to the mdstorage
> as are the others.. I don't want to include an md structure into 
> the KSE.. it's big enough as it is.. Every process has a KSE
> but only KSE-mode processes have the extra mdstorage area.
> 
> Do you feel strongly about this?

I do.  The point is that if its a struct you can do what ever you want;
just put the pointers in it.

struct mdkse {
        void *md_store;
        struct trapframe *md_frame;
        struct pcb *md_pcb;
};

I think that the upcall state should not need to be more than 3 or 4
pointers saved in the kernel, with no extra malloced stuff.

> 
> > 
> > The pcb should not be needed at all here; all of the meaningful kernel
> > mode register values are set below.  Capturing the whole execution
> > context at the time of the kse_new call (floating point registers,
> > debug registers) may be expensive and I don't think is worth doing.
> 
> Yes I started out with the PCB there but as I went I found I was needing
> less and less of it. I even have a comment to that effect somewhere..
> At this stage I still have it only because I wanted to make sure that
> I had good defaults for anything that I wasn't sure about.. 

Well, the defaults are documented in the hardware documentation for
a given platform...  Saving the whole pcb is not always practical,
it may be huge.

> 
> Also I haven't figured out what to do about FP registers
> and I may want to stuff them there at some stage...
> (not sure yet)
> 
> > 
> > The whole trick of a system call that returns multiple times is
> > dubious.  The fact that it works at all is machine dependent; for
> > sparc64 it needs wierd hacks in the kernel like is done for vfork.
> > It would be better to just register an upcall stack and entry point
> > with the kernel, like how signals work.  This would make mdkse even
> > smaller.
> 
> It's effectively the same thing..
> except it allows the function to have persistent state in all it's 
> local variables and arguments. Which is REALLY useful in the UTS.
> As for hacks.. we have the code in vfork, no?
> :-)
> (actually the code actually uses fork_return() to do the returns so if
> your hack is in there we get it for free.)

The hack is in cpu_fork.  The problem is that in order to save the call
safe registers, on entry to the kernel the kernel pushes a frame onto the
_user_ stack.  This saves the call safe registers that are active at the
time of the call to the kse_new asm stub in libc.  When the system call
returns the frame is popped off again and the registers are restored
(again by the kernel, not the user code).  However, the stack space used
to save the frame is now below the stack pointer, and will be clobbered by
interrupts, page faults, or function calls.  So the next time the kse_new
call returns, the frame that was saved on the original call has been
overwritten by normal stack usage, and the call safe registers at the
time of the orignal call have been clobbered (the memory that they were
saved in that is).

Some background: obviously the kernel has to be really careful when storing
to the user stack on entry to the kernel.  If that part of the stack is not
mapped, or if the stack pointer is corrupted, it can trigger a page fault or
an alignment fault.  These are detected very early (before calling C code,
because we haven't even switched to the kernel stack yet), and the register
window is written to the pcb, which will be copied out again on return from
the kernel so everything looks normal.

What we do for vfork is copy the frame from the user stack into the parent's
pcb, and arrange for it to be copied out when the child exits, restoring the
volatile part of the stack.  This means that we need a pcb to save the frame
in, as well as the trapframe.  The pcb is huge on sparc64, as much as 5K to
6K, depending on the number of windows supported by the cpu.  So we'd have
to copy almost a full page of memory for every upcall, whereas if they use
a signal style trampoline, all you need is a stack and pc, and some arguments.

> > > + union kse_td_ctx {
> > > +         struct {
> > > +                 int                     if_onstack;
> > > +                 struct intrframe        if_if;
> > > +         } intrfrm;
> > > +         struct {
> > > +                 int                     tf_onstack;
> > > +                 int                     tf_gs;
> > > +                 struct trapframe        tf_tf;
> > > +         } tfrm;
> > > +         mcontext_t      mcontext;
> > > + };
> > 
> > Please do not export trapframe and intrframe to userland like this;
> > just use mcontext.  Doing this makes it part of the kernel->userland
> > ABI more so than it already is, debuggers use it for core dumps which
> > is bad enough.  mcontext is already there for this purpose and required
> > to have a stable ABI.
> 
> They are only exported this way because Dan Eischen and I were
> experimenting with what context was needed.
> We have basically decided that the  mcontext is the one we want
> except that I am worried about FP regs.
> 
> intrfrm and tfrm will be deleted well before anyone else
> gets to play with it..
> 
> thanks for reminding me though..

Ok.

[...]

> > [...]
> > 
> > Before I spend much time on the machine dependent code for sparc64
> > I would like to see a well defined kernel->userland interface, with
> > long term ABI issues taken into consideration.  I'm not convinced
> > that the current kse_new scheme will work; I would much rather see
> > just an entry point, which is where the upcalls begin executing.
> > This is how the netbsd upcall works as far as I can tell.
> 
> But it's a lot harder to set the correct context in userland..
> you need to modify the userland UTS setup every time 
> there is a change in the compiler framing for example.
> This way, it's automatic.. You are restoring the registers
> in exactly the way you want them, where the other way, the kernel
> code doing the work needs to know what compiler is running the 
> userland and how it's optimising variables etc.
> 
> this is "self correcting" :-)

I don't really get what you mean.  What I want is for upcalls to look
like a function call.

void
upcall_handler(...)  <-- upcalls start executing here.
{
...
}

int
main(void)
{
...
        set_upcall_handler(upcall_stack, upcall_handler, ...);
...
}

Passing arguments to the upcall handler is easy, the calling conventions
are part of the ABI for the architecture its running on and can't change.
We already do this for signals.  You could easily have a pointer that's
passed opaquely to each upcall for keeping state.  The state just needs
to be organized into a structure that's easily passed around, rather
than arbitrary stack variables.  You can also then copy out arbitrary
things to the upcall stack and pass them to the handler, without needing
to have space already passed in in the original kse_new call.

> 
> > 
> > It is much more difficult to ensure that all the register values
> > end up the same on each return from the system call on sparc64, due
> > to the way that register stack works.  The current test program
> > will not work at all, because setjmp, longjmp cannot be used to
> > switch the stack in the same way.
> 
> The library will not be using setjmp and longjmp in this way but 
> instead the setcontext() call that dan wrote for the current thread
> library. If that works it should be enough.
> (I'd like to investigate your comments though... can you explain
> more about why it's a problem? It sure simplifies things on most
> architectures I've done this on..

[see above]

It works for simple, stack based architectures.  It does not work as
well for anything more complicated.  Its best for the upcalls to
need as little state as possible.

Jake

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Seeking OK to commit KSE MIII-again

Reply via email to