rtas: add work area allocator

Nathan Lynch Wed, 08 Feb 2023 06:48:51 -0800

Michael Ellerman <m...@ellerman.id.au> writes:
> Nathan Lynch via B4 Submission Endpoint
> <devnull+nathanl.linux.ibm....@kernel.org> writes:
>> diff --git a/arch/powerpc/include/asm/rtas-work-area.h 
>> b/arch/powerpc/include/asm/rtas-work-area.h
>> new file mode 100644
>> index 000000000000..76ccb039cc37
>> --- /dev/null
>> +++ b/arch/powerpc/include/asm/rtas-work-area.h
>> @@ -0,0 +1,45 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +#ifndef POWERPC_RTAS_WORK_AREA_H
>> +#define POWERPC_RTAS_WORK_AREA_H
>
> The usual style would be _ASM_POWERPC_RTAS_WORK_AREA_H.


OK. (will change in all new headers)

>> +static struct rtas_work_area_allocator_state {
>> +    struct gen_pool *gen_pool;
>> +    char *arena;
>> +    struct mutex mutex; /* serializes allocations */
>> +    struct wait_queue_head wqh;
>> +    mempool_t descriptor_pool;
>> +    bool available;
>> +} rwa_state_ = {
>> +    .mutex = __MUTEX_INITIALIZER(rwa_state_.mutex),
>> +    .wqh = __WAIT_QUEUE_HEAD_INITIALIZER(rwa_state_.wqh),
>> +};
>> +static struct rtas_work_area_allocator_state *rwa_state = &rwa_state_;
>
> I assumed the pointer was so you could swap this out at runtime or
> something, but I don't think you do.
>
> Any reason not to drop the pointer and just use rwa_state.foo accessors?
> That would also allow the struct to be anonymous.
>
> Or if you have the pointer you can at least make it NULL prior to init
> and avoid the need for "available".

I think it's there because earlier versions of this that I never posted
had unit tests. I'll either resurrect those or reduce the indirection.


>> +/*
>> + * A single work area buffer and descriptor to serve requests early in
>> + * boot before the allocator is fully initialized.
>> + */
>> +static bool early_work_area_in_use __initdata;
>> +static char early_work_area_buf[SZ_4K] __initdata;
>
> That should be page aligned I think?

Yes. It happens to be safe in this version because ibm,get-system-parameter,
which has no alignment requirement, is the only function used early
enough to use the buffer. But that's too fragile.


>> +static struct rtas_work_area early_work_area __initdata = {
>> +    .buf = early_work_area_buf,
>> +    .size = sizeof(early_work_area_buf),
>> +};
>> +
>> +
>> +static struct rtas_work_area * __init rtas_work_area_alloc_early(size_t 
>> size)
>> +{
>> +    WARN_ON(size > early_work_area.size);
>> +    WARN_ON(early_work_area_in_use);
>> +    early_work_area_in_use = true;
>> +    memset(early_work_area.buf, 0, early_work_area.size);
>> +    return &early_work_area;
>> +}
>> +
>> +static void __init rtas_work_area_free_early(struct rtas_work_area 
>> *work_area)
>> +{
>> +    WARN_ON(work_area != &early_work_area);
>> +    WARN_ON(!early_work_area_in_use);
>> +    early_work_area_in_use = false;
>> +}
>> +
>> +struct rtas_work_area * __ref rtas_work_area_alloc(size_t size)
>> +{
>> +    struct rtas_work_area *area;
>> +    unsigned long addr;
>> +
>> +    might_sleep();
>> +
>> +    WARN_ON(size > RTAS_WORK_AREA_MAX_ALLOC_SZ);
>> +    size = min_t(size_t, size, RTAS_WORK_AREA_MAX_ALLOC_SZ);
>
> This seems unsafe.
>
> If you return a buffer smaller than the caller asks for they're likely
> to read/write past the end of it and corrupt memory.

OK, let's figure out another way to handle this.

> AFAIK genalloc doesn't have guard pages or anything fancy to save us
> from that - but maybe I'm wrong, I've never used it.

Yeah we would have to build our own thing on top of it. And I don't
think it could be something that traps on access, it would have to be a
check in rtas_work_area_free(), after the fact.

> There's only three callers in the end, seems like we should just return
> NULL if the size is too large and have callers check the return value.

There are more conversions to do, and a property I hope to maintain is
that requests can't fail. Existing users of rtas_data_buf don't have
error paths for failure to acquire the buffer.

I believe the allocation size passed to rtas_work_area_alloc() can be
known at build time in all cases. Maybe we could prevent inappropriate
requests from being built with a compile-time assertion (untested):

/* rtas-work-area.h */

static inline struct rtas_work_area *rtas_work_area_alloc(size_t sz)
{
        static_assert(sz < RTAS_WORK_AREA_MAX_ALLOC_SZ);
        return __rtas_work_area_alloc(sz);
}

I think this would be OK? If I can't make it work I'll fall back to
returning NULL as you suggest, but it will make for more churn (and
risk) in the conversions.


>> +    if (!rwa_state->available) {
>> +            area = rtas_work_area_alloc_early(size);
>> +            goto out;
>> +    }
>> +
>> +    /*
>> +     * To ensure FCFS behavior and prevent a high rate of smaller
>> +     * requests from starving larger ones, use the mutex to queue
>> +     * allocations.
>> +     */
>> +    mutex_lock(&rwa_state->mutex);
>> +    wait_event(rwa_state->wqh,
>> +               (addr = gen_pool_alloc(rwa_state->gen_pool, size)) != 0);
>> +    mutex_unlock(&rwa_state->mutex);
>> +
>> +    area = mempool_alloc(&rwa_state->descriptor_pool, GFP_KERNEL);
>> +    *area = (typeof(*area)){
>> +            .size = size,
>> +            .buf = (char *)addr,
>> +    };
>
> That is an odd way to write that :)

yeah I'll change it.

>
>> +out:
>> +    pr_devel("%ps -> %s() -> buf=%p size=%zu\n",
>> +             (void *)_RET_IP_, __func__, area->buf, area->size);
>
> Can we drop those? They need a recompile to enable, so if someone needs
> debugging they can just rewrite them - or use some sort of tracing
> instead.

Sure.


>> +machine_arch_initcall(pseries, rtas_work_area_allocator_init);
>
> Should it live in platforms/pseries then?

Yeah it probably ought to. I am pretty sure the "work area" construct is
PAPR-specific, and I haven't found any evidence that it's used on
non-pseries.


>> +/**
>> + * rtas_work_area_reserve_arena() - reserve memory suitable for RTAS work 
>> areas.
>> + */
>> +int __init rtas_work_area_reserve_arena(const phys_addr_t limit)
>> +{
>> +    const phys_addr_t align = RTAS_WORK_AREA_ARENA_ALIGN;
>> +    const phys_addr_t size = RTAS_WORK_AREA_ARENA_SZ;
>> +    const phys_addr_t min = MEMBLOCK_LOW_LIMIT;
>> +    const int nid = NUMA_NO_NODE;
>
> This should probably also be restricted to pseries?

Yes.

Re: [PATCH v2 11/19] powerpc/rtas: add work area allocator

Reply via email to