On Sun, Mar 29, 2015 at 2:50 PM, Philip Guenther <[email protected]> wrote:
> On Sun, 29 Mar 2015, Joel Rees wrote:
>> Is there any good reason for interleaving the return addresses with data
>> on the data/parameter stack in C? I know it's the tradition, from back
>> when it was all we could hope for to have one page per process, but that
>> has not been the case for many years, I think.
>
> It's easy, efficient,

I don't agree with those two assertions, but I'm not really looking at
easy or efficient. I'm thinking about return pointers getting walked
on.

> doesn't burn another register for a second stack,

I've often pondered on whether the entire reason for having a frame
pointer is that you have to have something to lead you around the
return pointer.

Still, there is no reason to keep the frame link in a register. Dump
the saved parameter stack pointer on the flow-of-control stack, paired
with the return pointer, and you're done. If you think even that is
necessary. Or you could save the previous top of parameter stack below
the local variables and temporaries on the parameter stack, which is a
little more fragile, if you really want to make it appear to be a
linked list.

> and the ISA (instruction set architecture) may have direct support for it.

Looking at all the funky instructions and addressing modes the 68020
and 80286 and 80386 directly supported, but got left behind as CPU
advanced, direct instruction support really isn't an argument, to me.

> For example, in both i386 and x86_64 the 'call' and 'ret' instructions
> work with the same stack pointer, %sp, as the various 'push' and 'pop'
> instructions.

Yeah, it does feel like a waste not to use the instructions provided,
doesn't it?

>  If you use %sp for the stack with return addresses so you
> can use 'call' and 'ret', then what is stack pointer for your
> arguments/locals stack?

Well, if we link the frame on one of the stacks instead of in a
register, the erstwhile frame pointer register isn't doing anything in
particular any more.

>  On i386, you're crying for registers already;
> losing another would be bad.

Back when SP and BP were sixteen bits, and pointed into the same
segment unless overridden, there was definitely something unsettling
about using them independently, but if you consider that the 6809 was
even more restricted in stack space when you used SP and UP for two
separate stacks, you remember that, even now, most stacks aren't that
big anyway.

But that was then, this is now, and %sp and %bp aren't sixteen bits
any more. We can define regions that cause access exceptions to help
us know when our stacks and heaps collide. And are we still even using
the segment registers?

I know older MMUs are kind of tight on the number of page tables you
can keep active, but are they that tight? This is one of my questions,
by the way.

> Hmm, wasn't there a shipping processor which explicitly have two stacks
> something like this?  I have a vague memory that it may have been itanium,
> but that could be a hallucination.

My impression was that the register stack backing store was assumed to
be allocated with the rest of the procedure frame in the memory stack,
but the documentation seems to have been written by lawyers who really
didn't want their end users to be able to pierce the patent haze. I
couldn't say for sure, and I never had actual hardware to check
against, and I really don't want to dig into that dinosaur any more.

Intel has always made things harder than necessary.

>> Adding code to the program preamble to reserve space for another stack
>> with mmap shouldn't be hard at all. Default address separation of about
>> a quarter to a half a gig should be reasonable in 32 bit address space,
>> at any rate. New compiler switches would be needed to tune the
>> separation. I'm pretty sure openbsd has the means to keep a largish
>> no-access region between the stacks.
>
> Ugh, knobs are bad if more than a tiny fraction of program have to use
> them.

You won't hear me argue with you there. But default to a quarter gig
of unallocated logical address space below each stack and only a tiny
fraction of programs would need to change that.

>> The call protocol itself should be simpler, although I might expect some
>> debate about which stack to push frame pointers to when pushing frame
>> pointers. The problem, I think, is in convincing the compiler to refrain
>> from moving the frame pointer to the stack pointer on function entry.
>> Maybe.
>
> Simpler?

Well, yeah, I think so.

> I doubt it.  To support exception processing and debugger
> unwinding of calls and displaying variables from them you'll need some way
> to successively peel call frames off *both* stacks.

On the caller side, push your arguments, perform an unadorned call.
Something like

    SUB SP, parameterbytes


On the called side, if the frame is being maintained, save the
parameter stack pointer on the flow-of-control stack. Do your stuff.
When you're done, if the compiler knows what it has on the stack
anyway, you don't restore the parameter stack pointer, so you just
drop the saved one. Perform an unadorned return.

You don't even need instruction set support of any kind to keep you
out of trouble.

>> To those on the list who are intimate with the compiler(s), how difficult
>> would it be to change the function call protocol to push the program
>> counter to a separate stack from the parameters and locals?
>
> Heh, you're talking about creating a new ABI.  For difficultly level, look
> at the x32 ABI in Linux.  It's an alternative ABI for x86-long-mode,
> changing relatively few things from the amd64/x86_64 ABI, and it still
> took a huge effort.

This is what I'm asking about.

I have the impression that some of the stack smash mitigation
techniques in use in openbsd add a similar amount of complexity to the
call protocol.

> You might want to grab a copy of the ELF ABI for a CPU you're interested
> in, read through it and see what sort of changes would be necessary for
> supporting a two-stack model.  Example code sequences for argument
> passing, relocations, stack unrolling, registers set on process entry,...

Considering the various conventions already in use, I'm not sure it
really changes the ABI significantly, but I'll go back the the ELF
docs and see if I'm missing something.

> Oooh, and then threads come into play, where programs expect to be able to
> specify a single size for the stack, so maybe you should have the two
> stacks grow towards each other from opposite ends of the allocated stack
> memory?

I'm hoping that the abomination of lightweight processes sharing stack
space can be avoided with an overall saner run-time. Clean up the
run-time enough and we shouldn't need that kind of threads.

But, yeah, if we have to have that kind of threads, we'd end up having
to determine the maximum call depth for every independent thread. But
don't we have to do that anyway?

The flow-of-control stack, at any rate, is dead easy. One address per
nested call, or, if tracking the frames, a pair of addresses per
nested call.

>> Or am I speculating about a different world, still?
>
> Tomorrow is a different world, but only slightly so.

Which is my question put a different way --  whether this is too much
of a jump from current practice. And I guess, in the end, I'll
probably be the one who will need to get familiar enough with one of
the compilers to answer this question.

Just looking for some input and pointers from those who are already
familiar with the compilers. Thanks for helping me unpack my questions
a bit.

And since you suggest looking back over the ELF docs, I will.

-- 
Joel Rees

Be careful when you look at conspiracy.
Look first in your own heart,
and ask yourself if you are not your own worst enemy.
Arm yourself with knowledge of yourself, as well.

Reply via email to