Michael Ellerman <m...@ellerman.id.au> writes: > Nathan Lynch via B4 Submission Endpoint > <devnull+nathanl.linux.ibm....@kernel.org> writes: >> diff --git a/arch/powerpc/include/asm/rtas-work-area.h >> b/arch/powerpc/include/asm/rtas-work-area.h >> new file mode 100644 >> index 000000000000..76ccb039cc37 >> --- /dev/null >> +++ b/arch/powerpc/include/asm/rtas-work-area.h >> @@ -0,0 +1,45 @@ >> +/* SPDX-License-Identifier: GPL-2.0-only */ >> +#ifndef POWERPC_RTAS_WORK_AREA_H >> +#define POWERPC_RTAS_WORK_AREA_H > > The usual style would be _ASM_POWERPC_RTAS_WORK_AREA_H.
OK. (will change in all new headers) >> +static struct rtas_work_area_allocator_state { >> + struct gen_pool *gen_pool; >> + char *arena; >> + struct mutex mutex; /* serializes allocations */ >> + struct wait_queue_head wqh; >> + mempool_t descriptor_pool; >> + bool available; >> +} rwa_state_ = { >> + .mutex = __MUTEX_INITIALIZER(rwa_state_.mutex), >> + .wqh = __WAIT_QUEUE_HEAD_INITIALIZER(rwa_state_.wqh), >> +}; >> +static struct rtas_work_area_allocator_state *rwa_state = &rwa_state_; > > I assumed the pointer was so you could swap this out at runtime or > something, but I don't think you do. > > Any reason not to drop the pointer and just use rwa_state.foo accessors? > That would also allow the struct to be anonymous. > > Or if you have the pointer you can at least make it NULL prior to init > and avoid the need for "available". I think it's there because earlier versions of this that I never posted had unit tests. I'll either resurrect those or reduce the indirection. >> +/* >> + * A single work area buffer and descriptor to serve requests early in >> + * boot before the allocator is fully initialized. >> + */ >> +static bool early_work_area_in_use __initdata; >> +static char early_work_area_buf[SZ_4K] __initdata; > > That should be page aligned I think? Yes. It happens to be safe in this version because ibm,get-system-parameter, which has no alignment requirement, is the only function used early enough to use the buffer. But that's too fragile. >> +static struct rtas_work_area early_work_area __initdata = { >> + .buf = early_work_area_buf, >> + .size = sizeof(early_work_area_buf), >> +}; >> + >> + >> +static struct rtas_work_area * __init rtas_work_area_alloc_early(size_t >> size) >> +{ >> + WARN_ON(size > early_work_area.size); >> + WARN_ON(early_work_area_in_use); >> + early_work_area_in_use = true; >> + memset(early_work_area.buf, 0, early_work_area.size); >> + return &early_work_area; >> +} >> + >> +static void __init rtas_work_area_free_early(struct rtas_work_area >> *work_area) >> +{ >> + WARN_ON(work_area != &early_work_area); >> + WARN_ON(!early_work_area_in_use); >> + early_work_area_in_use = false; >> +} >> + >> +struct rtas_work_area * __ref rtas_work_area_alloc(size_t size) >> +{ >> + struct rtas_work_area *area; >> + unsigned long addr; >> + >> + might_sleep(); >> + >> + WARN_ON(size > RTAS_WORK_AREA_MAX_ALLOC_SZ); >> + size = min_t(size_t, size, RTAS_WORK_AREA_MAX_ALLOC_SZ); > > This seems unsafe. > > If you return a buffer smaller than the caller asks for they're likely > to read/write past the end of it and corrupt memory. OK, let's figure out another way to handle this. > AFAIK genalloc doesn't have guard pages or anything fancy to save us > from that - but maybe I'm wrong, I've never used it. Yeah we would have to build our own thing on top of it. And I don't think it could be something that traps on access, it would have to be a check in rtas_work_area_free(), after the fact. > There's only three callers in the end, seems like we should just return > NULL if the size is too large and have callers check the return value. There are more conversions to do, and a property I hope to maintain is that requests can't fail. Existing users of rtas_data_buf don't have error paths for failure to acquire the buffer. I believe the allocation size passed to rtas_work_area_alloc() can be known at build time in all cases. Maybe we could prevent inappropriate requests from being built with a compile-time assertion (untested): /* rtas-work-area.h */ static inline struct rtas_work_area *rtas_work_area_alloc(size_t sz) { static_assert(sz < RTAS_WORK_AREA_MAX_ALLOC_SZ); return __rtas_work_area_alloc(sz); } I think this would be OK? If I can't make it work I'll fall back to returning NULL as you suggest, but it will make for more churn (and risk) in the conversions. >> + if (!rwa_state->available) { >> + area = rtas_work_area_alloc_early(size); >> + goto out; >> + } >> + >> + /* >> + * To ensure FCFS behavior and prevent a high rate of smaller >> + * requests from starving larger ones, use the mutex to queue >> + * allocations. >> + */ >> + mutex_lock(&rwa_state->mutex); >> + wait_event(rwa_state->wqh, >> + (addr = gen_pool_alloc(rwa_state->gen_pool, size)) != 0); >> + mutex_unlock(&rwa_state->mutex); >> + >> + area = mempool_alloc(&rwa_state->descriptor_pool, GFP_KERNEL); >> + *area = (typeof(*area)){ >> + .size = size, >> + .buf = (char *)addr, >> + }; > > That is an odd way to write that :) yeah I'll change it. > >> +out: >> + pr_devel("%ps -> %s() -> buf=%p size=%zu\n", >> + (void *)_RET_IP_, __func__, area->buf, area->size); > > Can we drop those? They need a recompile to enable, so if someone needs > debugging they can just rewrite them - or use some sort of tracing > instead. Sure. >> +machine_arch_initcall(pseries, rtas_work_area_allocator_init); > > Should it live in platforms/pseries then? Yeah it probably ought to. I am pretty sure the "work area" construct is PAPR-specific, and I haven't found any evidence that it's used on non-pseries. >> +/** >> + * rtas_work_area_reserve_arena() - reserve memory suitable for RTAS work >> areas. >> + */ >> +int __init rtas_work_area_reserve_arena(const phys_addr_t limit) >> +{ >> + const phys_addr_t align = RTAS_WORK_AREA_ARENA_ALIGN; >> + const phys_addr_t size = RTAS_WORK_AREA_ARENA_SZ; >> + const phys_addr_t min = MEMBLOCK_LOW_LIMIT; >> + const int nid = NUMA_NO_NODE; > > This should probably also be restricted to pseries? Yes.