On Thu, 24 Oct 2024 at 02:22, David Laight <david.lai...@aculab.com> wrote: > > Would it be better to make the 'bogus' constant one that makes > all accesses fail?
Well, the bogus constant is actually used for other runtime cases too (well, right now that's only the dentry pointer, but could be more), it's not some kind of "just for the max user address". And for things like profiling, where the tools use the original object code, making the bogus constant really obvious is actually helpful. So I do think that "standing out" is more important than having some value that has semantic meaning, when that value is going to be overwritten at boot time. > So you soon find out it any code doesn't get patched. Yes, that would be nice, but if the simple patching logic is broken, you have worse problems. The patching logic really is just a couple of lines of code, and the location where we set this particular value is literally next to the place we do all our other alternatives, so if there's something wrong there, you basically have a broken system. > I also wonder how big the table of addresses to patch is. > If that gets into inlined functions it could be big. It's 4 bytes per location, so yes, it grows linearly by use - but not very much. And the patch table is in the init section that gets discarded after boot, along with all the other things like the altinstructions and things like the return sites for retpolines. Which are *much* bigger and more common. So yes, there's a table, and yes it grows, but at least in my personal config, the USER_PTR_MAX table is 424 bytes - and we free it after boot. 90% of that comes from 'access_ok()' sprinkled around and inlined. Just as a comparison, the altinstruction tables (both the pointers and the actual replacement instructions) is 25kB in that config. Also a drop in the ocean, and also something that gets free'd after init. > OTOH having a real function that does access_ok(), clac and address > masking may not problem. access_ok() itself is so rarely used these days that we could out-line it. But the code cost of a function call is likely higher than inlining the 8-byte constant and a couple of instructions: not because the call instruction itself, but because of the code generation pain around it (register save/restore etc). IOW, the call instruction is just five bytes, but it will typically cause spills and forced register choices for argument and return value. It will obviously depend a lot on the context of the function call, but to save 4 bytes for the table that gets freed, and have the 8-byte constant only once? That's a very false economy. Eg an example code sequence (from a random place in the kernel that I looked at with objdump is movabs $0x123456789abcdef,%rax cmp %rsi,%rax jb <__x64_sys_rt_sigreturn+0x20e> and that's 19 bytes. But I tried it with a function call, and that same function grew from 1320 bytes to 1324 bytes. So the function call version generated 4 bytes bigger code. I didn't figure out why, because register allocation was so different, but I do think it's exactly that: function calls cause limits on register use. So even if we didn't free the 4 byte entry after init, making access_ok() a function call would actually not be a win. And with how slow 'ret' instructions can be, we really don't want small function calls. In fact, it really makes me wonder if we should inline 'get_user()' entirely. > Especially if there is always a (PAGE sized) gap between the highest > user address and the lowest kernel address so the 'size' argument > to access_ok() can be ignored on the assumption that the accesses > are (reasonably) linear. Yes, that's what we do right now for the inline code generation anyway. (Ok, we actually do take the 'size' value into account when it's not constant, or when it's huge, but even that is just out of a silly abundance of caution and not a very common case). Linus