Thank you for your contribution! Sashiko AI review found 20 potential issue(s) 
to consider:
- [High] The driver hijacks global system suspend primitives to implement a 
hardware-specific memory resize operation, causing race conditions and 
deadlocks.
- [High] Allocated pages are freed using an excessively large size calculated 
via `get_order(size)`, leading to adjacent active memory being mistakenly 
cleared.
- [High] The `vpr->lock` mutex is dynamically allocated but never initialized.
- [High] In the chunk recycle loop, the driver uses the wrong loop variable to 
clear active and dirty bits, corrupting the chunk bitmap state.
- [High] Unchecked access to `dev->driver->pm->freeze` causes a kernel panic if 
a device has no PM ops.
- [High] Concurrent modification and traversal of the `vpr->devices` list.
- [High] Using `phys_to_page()` on `no-map` reserved memory yields invalid page 
pointers.
- [High] A race between module initialization and device probing can cause a 
NULL pointer dereference in `tegra_vpr_device_init`.
- [High] `tegra_vpr_resize` returns success (0) even if freezing a device fails 
and the hardware VPR is never resized, bypassing memory protections.
- [High] Integer truncation in the `num_pages` calculation allows massive 
DMA-BUF sizes to be allocated with 0 backing pages, leading to immediate kernel 
panics when mapped.
- [High] CMA memory is released before the VPR hardware firewall is removed.
- [Medium] The error recovery path in `tegra_vpr_recycle()` fails to restore 
the `vpr->active` bitmap.
- [Medium] Device node pointers are saved without taking reference counts, 
leading to UAF and a memory leak of the `/reserved-memory` node.
- [Medium] `tegra_vpr_find_free_region` passes `vpr->num_pages - 1` as the 
`end_page` boundary, preventing allocation of the last page of the VPR.
- [Medium] `kmalloc_array` is used to allocate an excessively large page 
pointer array.
- [Medium] Active VPR chunks are not deactivated if `dma_buf_export` fails.
- [Medium] `tegra_vpr_resize` calls `freeze_processes()` but the driver lacks a 
`CONFIG_FREEZER` or `CONFIG_PM_SLEEP` dependency.
- [Medium] Lockless iteration of `vpr->buffers` in debugfs exposes a TOCTOU 
list corruption race.
- [Medium] Negative bit shift when `vpr->size` is small, and chunk setup can 
create 0-sized chunks.
- [Low] Typo in `attach->dev` initialization results in self-assignment of a 
zero-initialized field.
--

commit 4c286d7ffd5b9cd569447863a311213170bdfb04
Author: Thierry Reding <[email protected]>
Subject: dma-buf: heaps: Add support for Tegra VPR

This commit exposes the Tegra Video-Protection-Region (VPR) as a DMA heap.
The memory region is dynamically resized by freezing userspace processes and
allocating/freeing chunks via CMA.

> diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig
> index bb729e91545c4..dd6035598d028 100644
> --- a/drivers/dma-buf/heaps/Kconfig
> +++ b/drivers/dma-buf/heaps/Kconfig
[ ... ]
> @@ -20,3 +20,10 @@ config DMABUF_HEAPS_CMA
>         Choose this option to enable dma-buf CMA heap. This heap is backed
>         by the Contiguous Memory Allocator (CMA). If your system has these
>         regions, you should say Y here.
> +
> +config DMABUF_HEAPS_TEGRA_VPR
> +     bool "NVIDIA Tegra Video-Protected-Region DMA-BUF Heap"
> +     depends on DMABUF_HEAPS && DMA_CMA

[Severity: Medium]
Should DMABUF_HEAPS_TEGRA_VPR depend on CONFIG_FREEZER or CONFIG_PM_SLEEP?

Since vpr->use_freezer is true, if CONFIG_FREEZER is disabled,
freeze_processes() evaluates to a static inline function returning -ENOSYS.
This will cause the VPR resize operations to unconditionally fail.

> +     help
> +       Choose this option to enable Video-Protected-Region (VPR) support on
> +       a range of NVIDIA Tegra devices.
> diff --git a/drivers/dma-buf/heaps/tegra-vpr.c 
> b/drivers/dma-buf/heaps/tegra-vpr.c
> new file mode 100644
> index 0000000000000..918cf66ec222c
> --- /dev/null
> +++ b/drivers/dma-buf/heaps/tegra-vpr.c
[ ... ]
> +static int tegra_vpr_resize(struct tegra_vpr *vpr)
> +{
> +     struct tegra_vpr_device *node;
> +     phys_addr_t base, size;
> +     int err, status = 0;
> +
> +     err = tegra_vpr_get_extents(vpr, &base, &size);
> +     if (err < 0) {
> +             pr_err("%s(): failed to get VPR extents: %d\n", __func__, err);
> +             return err;
> +     }
> +
> +     if (vpr->use_freezer) {
> +             err = freeze_processes();

[Severity: High]
Is it safe to invoke the global task freezer from within a device driver's
allocation path?

In tegra_vpr_allocate(), mutex_lock(&vpr->lock) is held, which places the
current task in TASK_UNINTERRUPTIBLE. tegra_vpr_resize() then conditionally
calls freeze_processes(). If another userspace task is concurrently waiting
for vpr->lock, it cannot be frozen because it is in an uninterruptible sleep
state, and the freeze operation will time out and fail.

> +             if (err < 0) {
> +                     pr_err("%s(): failed to freeze processes: %d\n",
> +                            __func__, err);
> +                     return err;
> +             }
> +     }
> +
> +     list_for_each_entry(node, &vpr->devices, node) {

[Severity: High]
Are the vpr->devices list modifications protected against concurrent
traversal?

tegra_vpr_device_init() and tegra_vpr_device_release() modify the list
without holding vpr->lock, while this traversal relies on the lock. If a
device binds or unbinds while a VPR resize is active, the list pointers
could become corrupted during the traversal.

> +             err = pm_generic_freeze(node->dev);
> +             if (err < 0) {
> +                     pr_err("failed to runtime suspend %s: %d\n",
> +                            dev_name(node->dev), err);
> +                     goto thaw;

[Severity: High]
Does this error path inadvertently return success?

The status variable is initialized to 0, and if pm_generic_freeze() fails,
the code jumps to the thaw label without updating status. The function then
returns 0, falsely indicating that the hardware VPR was successfully resized,
potentially compromising memory protections.

> +             }
> +     }
[ ... ]
> +static int tegra_vpr_find_free_region_in_range(struct tegra_vpr *vpr,
> +                                            unsigned int start_page,
> +                                            unsigned int end_page,
> +                                            unsigned int num_pages,
> +                                            unsigned int align)
> +{
> +     unsigned int pos, next = ALIGN(start_page, align);
> +
> +     /* Scan through aligned positions, trying to allocate at each one */
> +     for (pos = next; pos + num_pages <= end_page; pos = next) {

[Severity: Medium]
Could this prevent the allocation of the final page of the VPR?

tegra_vpr_find_free_region() passes vpr->num_pages - 1 as the end_page
parameter. This loop condition treats end_page as an exclusive boundary.
If attempting to allocate a buffer right at the end of the VPR, this
boundary check will evaluate to false and fail the allocation.

> +             next = find_next_bit(vpr->bitmap, pos + num_pages, pos);
[ ... ]
> +static struct tegra_vpr_buffer *
> +tegra_vpr_buffer_allocate(struct tegra_vpr *vpr, size_t size)
> +{
> +     unsigned int num_pages = size >> PAGE_SHIFT;

[Severity: High]
Can the 64-bit size parameter overflow the 32-bit unsigned int num_pages?

Shifting a massive size by PAGE_SHIFT could result in a value that truncates
to 0 in num_pages. This would cause kmalloc_array(0) to return ZERO_SIZE_PTR,
which could lead to an immediate kernel panic when the buffer is later mapped.

> +     unsigned int order = get_order(size);
> +     struct tegra_vpr_buffer *buffer;
> +     unsigned long first, last;
> +     int pageno, err;
> +     pgoff_t i;
[ ... ]
> +     buffer->pages = kmalloc_array(buffer->num_pages,
> +                                   sizeof(*buffer->pages),
> +                                   GFP_KERNEL);

[Severity: Medium]
Will this order-10 contiguous allocation reliably succeed on a heavily
utilized system?

If the VPR is sized for multi-gigabyte video frames (e.g., a 2 GiB allocation
requires 524,288 pages), kmalloc_array() will require multiple megabytes of
contiguous physical memory. This is highly likely to fail due to fragmentation.
Would kvmalloc_array() be more appropriate here?

> +     if (!buffer->pages) {
> +             err = -ENOMEM;
> +             goto free;
> +     }
[ ... ]
> +static void tegra_vpr_buffer_release(struct tegra_vpr_buffer *buffer)
> +{
> +     struct tegra_vpr *vpr = buffer->vpr;
> +     struct tegra_vpr_buffer *entry;
> +     unsigned long first, last;
> +     unsigned int i;
[ ... ]
> +     bitmap_release_region(vpr->bitmap, buffer->pageno, buffer->order);

[Severity: High]
Does this logic mistakenly clear adjacent active memory?

In tegra_vpr_buffer_allocate(), the allocation sets exactly num_pages bits
in vpr->bitmap. However, here it clears bits using buffer->order which is
calculated as get_order(size). Since get_order() rounds up to the next power
of two, non-power-of-two sizes will cause too many bits to be released
(e.g., a 5-page allocation sets 5 bits but order 3 clears 8 bits). This could
mark adjacent pages belonging to other active allocations as free, leading
to reallocation and data corruption.

> +     list_del(&buffer->list);
> +     kfree(buffer->pages);
> +     kfree(buffer);
[ ... ]
> +static int tegra_vpr_attach(struct dma_buf *buf,
> +                         struct dma_buf_attachment *attachment)
> +{
> +     struct tegra_vpr_buffer *buffer = buf->priv;
> +     struct tegra_vpr_attachment *attach;
> +     int err;
> +
> +     attach = kzalloc_obj(*attach, GFP_KERNEL);
> +     if (!attach)
> +             return -ENOMEM;
> +
> +     err = sg_alloc_table_from_pages(&attach->sgt, buffer->pages,
> +                                     buffer->num_pages, 0, buffer->size,
> +                                     GFP_KERNEL);
> +     if (err < 0)
> +             goto free;
> +
> +     attach->dev = attach->dev;

[Severity: Low]
Is this field meant to be initialized from attachment->dev?

As written, it assigns the zero-allocated field back to itself, leaving
attach->dev uninitialized.

> +     INIT_LIST_HEAD(&attach->list);
> +     attachment->priv = attach;
[ ... ]
> +static void tegra_vpr_recycle(struct tegra_vpr *vpr)
> +{
> +     DECLARE_BITMAP(dirty, vpr->num_chunks);
> +     unsigned int i;
> +     int err;
> +
> +     bitmap_zero(dirty, vpr->num_chunks);
> +
> +     /*
> +      * Deactivate any unused chunks from the bottom...
> +      */
> +     for (i = 0; i < vpr->num_chunks; i++) {
> +             struct tegra_vpr_chunk *chunk = &vpr->chunks[i];
> +
> +             if (!chunk->active)
> +                     continue;
> +
> +             if (chunk->num_buffers > 0)
> +                     break;
> +
> +             err = tegra_vpr_chunk_deactivate(chunk);

[Severity: High]
Is the CMA memory released back to the general pool before the VPR hardware
firewall is removed?

tegra_vpr_chunk_deactivate() immediately calls cma_release(), making the
memory available to other subsystems. The firewall isn't removed until
tegra_vpr_resize() is called afterward. Any CPU access to the newly
re-allocated memory before the firewall is disabled could cause a hardware
bus error or CPU crash.

> +             if (err < 0)
> +                     pr_err("failed to deactivate chunk #%u\n", i);
> +             else {
> +                     clear_bit(i, vpr->active);
> +                     set_bit(i, dirty);
> +             }
> +     }
> +
> +     /*
> +      * ... and the top.
> +      */
> +     for (i = 0; i < vpr->num_chunks; i++) {
> +             unsigned int index = vpr->num_chunks - i - 1;
> +             struct tegra_vpr_chunk *chunk = &vpr->chunks[index];
> +
> +             if (!chunk->active)
> +                     continue;
> +
> +             if (chunk->num_buffers > 0)
> +                     break;
> +
> +             err = tegra_vpr_chunk_deactivate(chunk);
> +             if (err < 0)
> +                     pr_err("failed to deactivate chunk #%u\n", index);
> +             else {
> +                     clear_bit(i, vpr->active);

[Severity: High]
Could this loop corrupt the chunk bitmap state?

The top chunk loop iterates over the chunks from the top down using the
calculated index (vpr->num_chunks - i - 1), but when modifying the bitmap,
it clears bit i instead of the calculated index. This could deactivate the
top chunk in hardware but mark the bottom chunk as inactive in the software
bitmap.

> +                     set_bit(i, dirty);
> +             }
> +     }
[ ... ]
> +activate:
> +     for_each_set_bit(i, dirty, vpr->num_chunks) {
> +             err = tegra_vpr_chunk_activate(&vpr->chunks[i]);

[Severity: Medium]
Should this error recovery path also restore the bit in vpr->active?

If tegra_vpr_resize() fails, the code jumps here and successfully calls
tegra_vpr_chunk_activate(), but it doesn't set the bit in vpr->active.
Since the bit was cleared in the deactivation loop, the software state
will indicate the chunk is free, breaking internal tracking.

> +             if (WARN_ON(err < 0))
> +                     pr_err("failed to activate chunk #%u: %d\n", i, err);
> +     }
> +}
[ ... ]
> +static struct dma_buf *tegra_vpr_allocate(struct dma_heap *heap,
> +                                       unsigned long len, u32 fd_flags,
> +                                       u64 heap_flags)
> +{
> +     struct tegra_vpr *vpr = dma_heap_get_drvdata(heap);
> +     struct tegra_vpr_buffer *buffer, *entry;
> +     size_t size = ALIGN(len, vpr->align);
> +     DEFINE_DMA_BUF_EXPORT_INFO(export);
> +     struct dma_buf *buf;
> +
> +     mutex_lock(&vpr->lock);

[Severity: High]
Could the uninitialized vpr->lock cause a system crash?

The tegra_vpr structure is allocated using kzalloc_obj(), which
zero-initializes the memory, but mutex_init(&vpr->lock) is never called.
When this function attempts to acquire the lock, it interacts with an
uninitialized mutex, which can trigger warnings or a kernel panic.

> +
> +     buffer = tegra_vpr_buffer_allocate(vpr, size);
[ ... ]
> +     buf = dma_buf_export(&export);
> +     if (IS_ERR(buf))
> +             tegra_vpr_buffer_release(buffer);

[Severity: Medium]
Should the active VPR chunks be deactivated if dma_buf_export() fails?

tegra_vpr_buffer_release() decrements the chunk use counts but does not
deactivate the chunks. A missing call to tegra_vpr_recycle() could mean
chunks successfully activated during this allocation attempt remain pinned
in CMA indefinitely.

> +
> +     mutex_unlock(&vpr->lock);
> +     return buf;
> +}
> +
> +static void tegra_vpr_debugfs_show_buffers(struct tegra_vpr *vpr,
> +                                        struct seq_file *s)
> +{
> +     struct tegra_vpr_buffer *buffer;
> +     char buf[16];
> +
> +     list_for_each_entry(buffer, &vpr->buffers, list) {

[Severity: Medium]
Is it safe to traverse the shared vpr->buffers list without holding vpr->lock?

Concurrent buffer allocations or releases could modify the list pointers
mid-traversal, potentially causing a kernel panic when reading the debugfs
node.

> +             string_get_size(buffer->size, 1, STRING_UNITS_2, buf,
> +                             sizeof(buf));
> +             seq_printf(s, "  %pap-%pap (%s)\n", &buffer->start,
> +                        &buffer->limit, buf);
> +
> +     }
> +}
[ ... ]
> +static int tegra_vpr_setup_chunks(struct tegra_vpr *vpr, const char *name)
> +{
> +     phys_addr_t start, limit;
> +     unsigned int order, i;
> +     size_t max_size;
> +     int err;
> +
> +     /* This seems a reasonable value, so hard-code this for now. */
> +     vpr->num_chunks = 4;
> +
> +     vpr->chunks = kcalloc(vpr->num_chunks, sizeof(*vpr->chunks),
> +                           GFP_KERNEL);
> +     if (!vpr->chunks)
> +             return -ENOMEM;
> +
> +     max_size = PAGE_SIZE << (get_order(vpr->size) - ilog2(vpr->num_chunks));

[Severity: Medium]
Could this bit shift result in a negative shift amount?

If the device tree specifies a small VPR size, get_order(vpr->size) might
be 0, and subtracting ilog2(vpr->num_chunks) (e.g., 2) would result in -2.
Shifting PAGE_SIZE by a negative amount causes undefined behavior.

> +     order = get_order(vpr->align);
[ ... ]
> +static int __init tegra_vpr_add_heap(struct reserved_mem *rmem,
> +                                  struct device_node *np)
> +{
> +     struct dma_heap_export_info info = {};
> +     unsigned long first, last;
> +     struct dma_heap *heap;
> +     struct tegra_vpr *vpr;
> +     int err;
> +
> +     vpr = kzalloc_obj(*vpr, GFP_KERNEL);
> +     if (!vpr)
> +             return -ENOMEM;
> +
> +     INIT_LIST_HEAD(&vpr->buffers);
> +     INIT_LIST_HEAD(&vpr->devices);
> +     vpr->resizable = !of_property_read_bool(np, "no-map");
> +     vpr->use_freezer = true;
> +     vpr->dev_node = np;

[Severity: Medium]
Does this code leak the device node reference?

When storing the child node pointer in vpr->dev_node, should it be acquiring
a reference with of_node_get()? The scoped loop in tegra_vpr_init() will drop
its reference when it finishes, which could leave a dangling pointer here.

> +     vpr->align = PAGE_SIZE;
> +     vpr->base = rmem->base;
> +     vpr->size = rmem->size;
> +
> +     /* common setup */
> +     vpr->start_page = phys_to_page(vpr->base);

[Severity: High]
Can phys_to_page() safely be used if the VPR device tree node specifies no-map?

If the memory is removed from the direct mapping, there is no valid struct
page array associated with it. Calling phys_to_page() here and later
dereferencing these page pointers in tegra_vpr_attach() via
sg_alloc_table_from_pages() could cause a system crash.

> +     vpr->num_pages = vpr->size >> PAGE_SHIFT;
[ ... ]
> +static int __init tegra_vpr_init(void)
> +{
> +     const char *compatible = "nvidia,tegra-video-protection-region";
> +     struct device_node *parent;
> +     struct reserved_mem *rmem;
> +     int err;
> +
> +     parent = of_find_node_by_path("/reserved-memory");

[Severity: Medium]
Does this code leak the device node reference?

of_find_node_by_path() increments the reference count for the
/reserved-memory node, but there doesn't seem to be a matching of_node_put()
call.

> +     if (!parent)
> +             return 0;
[ ... ]
> +static int tegra_vpr_device_init(struct reserved_mem *rmem, struct device 
> *dev)
> +{
> +     struct dma_heap *heap = rmem->priv;
> +     struct tegra_vpr *vpr = dma_heap_get_drvdata(heap);

[Severity: High]
Could a hardware device probing before the module initialization runs cause
a NULL pointer dereference?

tegra_vpr_rmem_ops is registered early via RESERVEDMEM_OF_DECLARE. If a
device probes before tegra_vpr_init() executes, tegra_vpr_device_init() will
be called while rmem->priv is still NULL, causing dma_heap_get_drvdata() to
dereference a NULL pointer.

> +     struct tegra_vpr_device *node;
> +
> +     if (!dev->driver->pm->freeze || !dev->driver->pm->thaw)

[Severity: High]
Can this unconditionally dereference a NULL pointer?

If the device's driver does not provide PM operations (dev->driver->pm is
NULL), this check will cause a kernel panic during device probe.

> +             return -EINVAL;
> +
> +     node = kzalloc_obj(*node, GFP_KERNEL);
> +     if (!node)
> +             return -ENOMEM;
> +
> +     INIT_LIST_HEAD(&node->node);
> +     node->dev = dev;
> +
> +     list_add_tail(&node->node, &vpr->devices);
> +
> +     return 0;
> +}

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=8

Reply via email to