On 8/8/19 10:27 AM, Catalin Marinas wrote:
> On Wed, Aug 07, 2019 at 01:38:16PM -0700, Dave Hansen wrote:
> Extending the interface is still possible even with the current
> proposal, by changing arg2 etc. We also don't seem to be consistent in
> sys_prctl().
We are not consistent because it took a long time to learn this lesson,
but I think this is a best-practice that we follow for new ones.
>> Also, shouldn't this be converted over to an arch_prctl()?
>
> What do you mean by arch_prctl()? We don't have such thing, apart from
> maybe arch_prctl_spec_ctrl_*(). We achieve the same thing with the
> {SET,GET}_TAGGED_ADDR_CTRL macros. They could be renamed to
> arch_prctl_tagged_addr_{set,get} or something but I don't see much
> point.
Silly me. We have an x86-specific:
SYSCALL_DEFINE2(arch_prctl, int , option, unsigned long , arg2)
I guess there's no ARM equivalent so you're stuck with the generic one.
> What would be better (for a separate patch series) is to clean up
> sys_prctl() and move the arch-specific options into separate
> arch_prctl() under arch/*/kernel/. But it's not really for this series.
I think it does make sense for truly arch-specific features to stay out
of the arch-generic prctl(). Yes, I know I've personally violated this
in the past. :)
>> What is the scope of these prctl()'s? Are they thread-scoped or
>> process-scoped? Can two threads in the same process run with different
>> tagging ABI modes?
>
> Good point. They are thread-scoped and this should be made clear in the
> doc. Two threads can have different modes.
>
> The expectation is that this is invoked early during process start (by
> the dynamic loader or libc init) while in single-thread mode and
> subsequent threads will inherit the same mode. However, other uses are
> possible.
If that's the expectation, it would be really nice to codify it.
Basically, you can't enable the feature if another thread is already
been forked off.
> That said, do we have a precedent for changing user ABI from the kernel
> cmd line? 'noexec32', 'vsyscall' I think come close. With the prctl()
> for opt-in, controlling this from the cmd line is not too bad (though my
> preference is still for the sysctl).
There are certainly user-visible things like being able to select
various CPU features.
>>> +When a process has successfully enabled the new ABI by invoking
>>> +prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE), the following
>>> +behaviours are guaranteed:
>>> +
>>> +- Every currently available syscall, except the cases mentioned in section
>>> + 3, can accept any valid tagged pointer. The same rule is applicable to
>>> + any syscall introduced in the future.
>>> +
>>> +- The syscall behaviour is undefined for non valid tagged pointers.
>>
>> Do you really mean "undefined"? I mean, a bad pointer is a bad pointer.
>> Why should it matter if it's a tagged bad pointer or an untagged bad
>> pointer?
>
> Szabolcs already replied here. We may have tagged pointers that can be
> dereferenced just fine but being passed to the kernel may not be well
> defined (e.g. some driver doing a find_vma() that fails unless it
> explicitly untags the address). It's as undefined as the current
> behaviour (without these patches) guarantees.
It might just be nicer to point out what this features *changes* about
invalid pointer handling, which is nothing. :) Maybe:
The syscall behaviour for invalid pointers is the same for both
tagged and untagged pointers.
>>> +- prctl(PR_SET_MM, ``*``, ...) other than arg2 PR_SET_MM_MAP and
>>> + PR_SET_MM_MAP_SIZE.
>>> +
>>> +- prctl(PR_SET_MM, PR_SET_MM_MAP{,_SIZE}, ...) struct prctl_mm_map fields.
>>> +
>>> +Any attempt to use non-zero tagged pointers will lead to undefined
>>> +behaviour.
>>
>> I wonder if you want to generalize this a bit. I think you're saying
>> that parts of the ABI that modify the *layout* of the address space
>> never accept tagged pointers.
>
> I guess our difficulty in specifying this may have been caused by
> over-generalising. For example, madvise/mprotect came under the same
> category but there is a use-case for malloc'ed pointers (and tagged) to
> the kernel (e.g. MADV_DONTNEED). If we can restrict the meaning to
> address space *layout* manipulation, we'd have mmap/mremap/munmap,
> brk/sbrk, prctl(PR_SET_MM). Did I miss anything?. Other related syscalls
> like mprotect/madvise preserve the layout while only changing permissions,
> backing store, so the would be allowed to accept tags.
shmat() comes to mind. I also did a quick grep for mmap_sem taken for
write and didn't see anything else obvious jump out at me.
>> It looks like the TAG_SHIFT and tag size are pretty baked into the
>> aarch64 architecture. But, are you confident that no future
>> implementations will want different positions or sizes? (obviously
>> controlled by other TCR_EL1 bits)
>
> For the top-byte-ignore (TBI), that's been baked in the architecture
> since ARMv8.0 and we'll have to keep the backwards compatible mode. As
> the name implies, it's the top byte of the address and that's what the
> document above refers to.
>
> With MTE, I can't exclude other configurations in the future but I'd
> expect the kernel to present the option as a new HWCAP and the user to
> explicitly opt in via a new prctl() flag. I seriously doubt we'd break
> existing binaries. So, yes TAG_SHIFT may be different but so would the
> prctl() above.
Basically, what you have is a "turn tagging on" and "turn tagging off"
call which are binary: all on or all off. How about exposing a mask:
/* Replace hard-coded mask size/position: */
unsigned long mask = prctl(GET_POSSIBLE_TAGGED_ADDR_BITS);
if (mask == 0)
// no tagging is supported obviously
prctl(SET_TAGGED_ADDR_BITS, mask);
// now userspace knows via 'mask' where the tag bits are