Thank you for your contribution! Sashiko AI review found 20 potential issue(s) to consider: - [High] The driver hijacks global system suspend primitives to implement a hardware-specific memory resize operation, causing race conditions and deadlocks. - [High] Allocated pages are freed using an excessively large size calculated via `get_order(size)`, leading to adjacent active memory being mistakenly cleared. - [High] The `vpr->lock` mutex is dynamically allocated but never initialized. - [High] In the chunk recycle loop, the driver uses the wrong loop variable to clear active and dirty bits, corrupting the chunk bitmap state. - [High] Unchecked access to `dev->driver->pm->freeze` causes a kernel panic if a device has no PM ops. - [High] Concurrent modification and traversal of the `vpr->devices` list. - [High] Using `phys_to_page()` on `no-map` reserved memory yields invalid page pointers. - [High] A race between module initialization and device probing can cause a NULL pointer dereference in `tegra_vpr_device_init`. - [High] `tegra_vpr_resize` returns success (0) even if freezing a device fails and the hardware VPR is never resized, bypassing memory protections. - [High] Integer truncation in the `num_pages` calculation allows massive DMA-BUF sizes to be allocated with 0 backing pages, leading to immediate kernel panics when mapped. - [High] CMA memory is released before the VPR hardware firewall is removed. - [Medium] The error recovery path in `tegra_vpr_recycle()` fails to restore the `vpr->active` bitmap. - [Medium] Device node pointers are saved without taking reference counts, leading to UAF and a memory leak of the `/reserved-memory` node. - [Medium] `tegra_vpr_find_free_region` passes `vpr->num_pages - 1` as the `end_page` boundary, preventing allocation of the last page of the VPR. - [Medium] `kmalloc_array` is used to allocate an excessively large page pointer array. - [Medium] Active VPR chunks are not deactivated if `dma_buf_export` fails. - [Medium] `tegra_vpr_resize` calls `freeze_processes()` but the driver lacks a `CONFIG_FREEZER` or `CONFIG_PM_SLEEP` dependency. - [Medium] Lockless iteration of `vpr->buffers` in debugfs exposes a TOCTOU list corruption race. - [Medium] Negative bit shift when `vpr->size` is small, and chunk setup can create 0-sized chunks. - [Low] Typo in `attach->dev` initialization results in self-assignment of a zero-initialized field. --
commit 4c286d7ffd5b9cd569447863a311213170bdfb04 Author: Thierry Reding <[email protected]> Subject: dma-buf: heaps: Add support for Tegra VPR This commit exposes the Tegra Video-Protection-Region (VPR) as a DMA heap. The memory region is dynamically resized by freezing userspace processes and allocating/freeing chunks via CMA. > diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig > index bb729e91545c4..dd6035598d028 100644 > --- a/drivers/dma-buf/heaps/Kconfig > +++ b/drivers/dma-buf/heaps/Kconfig [ ... ] > @@ -20,3 +20,10 @@ config DMABUF_HEAPS_CMA > Choose this option to enable dma-buf CMA heap. This heap is backed > by the Contiguous Memory Allocator (CMA). If your system has these > regions, you should say Y here. > + > +config DMABUF_HEAPS_TEGRA_VPR > + bool "NVIDIA Tegra Video-Protected-Region DMA-BUF Heap" > + depends on DMABUF_HEAPS && DMA_CMA [Severity: Medium] Should DMABUF_HEAPS_TEGRA_VPR depend on CONFIG_FREEZER or CONFIG_PM_SLEEP? Since vpr->use_freezer is true, if CONFIG_FREEZER is disabled, freeze_processes() evaluates to a static inline function returning -ENOSYS. This will cause the VPR resize operations to unconditionally fail. > + help > + Choose this option to enable Video-Protected-Region (VPR) support on > + a range of NVIDIA Tegra devices. > diff --git a/drivers/dma-buf/heaps/tegra-vpr.c > b/drivers/dma-buf/heaps/tegra-vpr.c > new file mode 100644 > index 0000000000000..918cf66ec222c > --- /dev/null > +++ b/drivers/dma-buf/heaps/tegra-vpr.c [ ... ] > +static int tegra_vpr_resize(struct tegra_vpr *vpr) > +{ > + struct tegra_vpr_device *node; > + phys_addr_t base, size; > + int err, status = 0; > + > + err = tegra_vpr_get_extents(vpr, &base, &size); > + if (err < 0) { > + pr_err("%s(): failed to get VPR extents: %d\n", __func__, err); > + return err; > + } > + > + if (vpr->use_freezer) { > + err = freeze_processes(); [Severity: High] Is it safe to invoke the global task freezer from within a device driver's allocation path? In tegra_vpr_allocate(), mutex_lock(&vpr->lock) is held, which places the current task in TASK_UNINTERRUPTIBLE. tegra_vpr_resize() then conditionally calls freeze_processes(). If another userspace task is concurrently waiting for vpr->lock, it cannot be frozen because it is in an uninterruptible sleep state, and the freeze operation will time out and fail. > + if (err < 0) { > + pr_err("%s(): failed to freeze processes: %d\n", > + __func__, err); > + return err; > + } > + } > + > + list_for_each_entry(node, &vpr->devices, node) { [Severity: High] Are the vpr->devices list modifications protected against concurrent traversal? tegra_vpr_device_init() and tegra_vpr_device_release() modify the list without holding vpr->lock, while this traversal relies on the lock. If a device binds or unbinds while a VPR resize is active, the list pointers could become corrupted during the traversal. > + err = pm_generic_freeze(node->dev); > + if (err < 0) { > + pr_err("failed to runtime suspend %s: %d\n", > + dev_name(node->dev), err); > + goto thaw; [Severity: High] Does this error path inadvertently return success? The status variable is initialized to 0, and if pm_generic_freeze() fails, the code jumps to the thaw label without updating status. The function then returns 0, falsely indicating that the hardware VPR was successfully resized, potentially compromising memory protections. > + } > + } [ ... ] > +static int tegra_vpr_find_free_region_in_range(struct tegra_vpr *vpr, > + unsigned int start_page, > + unsigned int end_page, > + unsigned int num_pages, > + unsigned int align) > +{ > + unsigned int pos, next = ALIGN(start_page, align); > + > + /* Scan through aligned positions, trying to allocate at each one */ > + for (pos = next; pos + num_pages <= end_page; pos = next) { [Severity: Medium] Could this prevent the allocation of the final page of the VPR? tegra_vpr_find_free_region() passes vpr->num_pages - 1 as the end_page parameter. This loop condition treats end_page as an exclusive boundary. If attempting to allocate a buffer right at the end of the VPR, this boundary check will evaluate to false and fail the allocation. > + next = find_next_bit(vpr->bitmap, pos + num_pages, pos); [ ... ] > +static struct tegra_vpr_buffer * > +tegra_vpr_buffer_allocate(struct tegra_vpr *vpr, size_t size) > +{ > + unsigned int num_pages = size >> PAGE_SHIFT; [Severity: High] Can the 64-bit size parameter overflow the 32-bit unsigned int num_pages? Shifting a massive size by PAGE_SHIFT could result in a value that truncates to 0 in num_pages. This would cause kmalloc_array(0) to return ZERO_SIZE_PTR, which could lead to an immediate kernel panic when the buffer is later mapped. > + unsigned int order = get_order(size); > + struct tegra_vpr_buffer *buffer; > + unsigned long first, last; > + int pageno, err; > + pgoff_t i; [ ... ] > + buffer->pages = kmalloc_array(buffer->num_pages, > + sizeof(*buffer->pages), > + GFP_KERNEL); [Severity: Medium] Will this order-10 contiguous allocation reliably succeed on a heavily utilized system? If the VPR is sized for multi-gigabyte video frames (e.g., a 2 GiB allocation requires 524,288 pages), kmalloc_array() will require multiple megabytes of contiguous physical memory. This is highly likely to fail due to fragmentation. Would kvmalloc_array() be more appropriate here? > + if (!buffer->pages) { > + err = -ENOMEM; > + goto free; > + } [ ... ] > +static void tegra_vpr_buffer_release(struct tegra_vpr_buffer *buffer) > +{ > + struct tegra_vpr *vpr = buffer->vpr; > + struct tegra_vpr_buffer *entry; > + unsigned long first, last; > + unsigned int i; [ ... ] > + bitmap_release_region(vpr->bitmap, buffer->pageno, buffer->order); [Severity: High] Does this logic mistakenly clear adjacent active memory? In tegra_vpr_buffer_allocate(), the allocation sets exactly num_pages bits in vpr->bitmap. However, here it clears bits using buffer->order which is calculated as get_order(size). Since get_order() rounds up to the next power of two, non-power-of-two sizes will cause too many bits to be released (e.g., a 5-page allocation sets 5 bits but order 3 clears 8 bits). This could mark adjacent pages belonging to other active allocations as free, leading to reallocation and data corruption. > + list_del(&buffer->list); > + kfree(buffer->pages); > + kfree(buffer); [ ... ] > +static int tegra_vpr_attach(struct dma_buf *buf, > + struct dma_buf_attachment *attachment) > +{ > + struct tegra_vpr_buffer *buffer = buf->priv; > + struct tegra_vpr_attachment *attach; > + int err; > + > + attach = kzalloc_obj(*attach, GFP_KERNEL); > + if (!attach) > + return -ENOMEM; > + > + err = sg_alloc_table_from_pages(&attach->sgt, buffer->pages, > + buffer->num_pages, 0, buffer->size, > + GFP_KERNEL); > + if (err < 0) > + goto free; > + > + attach->dev = attach->dev; [Severity: Low] Is this field meant to be initialized from attachment->dev? As written, it assigns the zero-allocated field back to itself, leaving attach->dev uninitialized. > + INIT_LIST_HEAD(&attach->list); > + attachment->priv = attach; [ ... ] > +static void tegra_vpr_recycle(struct tegra_vpr *vpr) > +{ > + DECLARE_BITMAP(dirty, vpr->num_chunks); > + unsigned int i; > + int err; > + > + bitmap_zero(dirty, vpr->num_chunks); > + > + /* > + * Deactivate any unused chunks from the bottom... > + */ > + for (i = 0; i < vpr->num_chunks; i++) { > + struct tegra_vpr_chunk *chunk = &vpr->chunks[i]; > + > + if (!chunk->active) > + continue; > + > + if (chunk->num_buffers > 0) > + break; > + > + err = tegra_vpr_chunk_deactivate(chunk); [Severity: High] Is the CMA memory released back to the general pool before the VPR hardware firewall is removed? tegra_vpr_chunk_deactivate() immediately calls cma_release(), making the memory available to other subsystems. The firewall isn't removed until tegra_vpr_resize() is called afterward. Any CPU access to the newly re-allocated memory before the firewall is disabled could cause a hardware bus error or CPU crash. > + if (err < 0) > + pr_err("failed to deactivate chunk #%u\n", i); > + else { > + clear_bit(i, vpr->active); > + set_bit(i, dirty); > + } > + } > + > + /* > + * ... and the top. > + */ > + for (i = 0; i < vpr->num_chunks; i++) { > + unsigned int index = vpr->num_chunks - i - 1; > + struct tegra_vpr_chunk *chunk = &vpr->chunks[index]; > + > + if (!chunk->active) > + continue; > + > + if (chunk->num_buffers > 0) > + break; > + > + err = tegra_vpr_chunk_deactivate(chunk); > + if (err < 0) > + pr_err("failed to deactivate chunk #%u\n", index); > + else { > + clear_bit(i, vpr->active); [Severity: High] Could this loop corrupt the chunk bitmap state? The top chunk loop iterates over the chunks from the top down using the calculated index (vpr->num_chunks - i - 1), but when modifying the bitmap, it clears bit i instead of the calculated index. This could deactivate the top chunk in hardware but mark the bottom chunk as inactive in the software bitmap. > + set_bit(i, dirty); > + } > + } [ ... ] > +activate: > + for_each_set_bit(i, dirty, vpr->num_chunks) { > + err = tegra_vpr_chunk_activate(&vpr->chunks[i]); [Severity: Medium] Should this error recovery path also restore the bit in vpr->active? If tegra_vpr_resize() fails, the code jumps here and successfully calls tegra_vpr_chunk_activate(), but it doesn't set the bit in vpr->active. Since the bit was cleared in the deactivation loop, the software state will indicate the chunk is free, breaking internal tracking. > + if (WARN_ON(err < 0)) > + pr_err("failed to activate chunk #%u: %d\n", i, err); > + } > +} [ ... ] > +static struct dma_buf *tegra_vpr_allocate(struct dma_heap *heap, > + unsigned long len, u32 fd_flags, > + u64 heap_flags) > +{ > + struct tegra_vpr *vpr = dma_heap_get_drvdata(heap); > + struct tegra_vpr_buffer *buffer, *entry; > + size_t size = ALIGN(len, vpr->align); > + DEFINE_DMA_BUF_EXPORT_INFO(export); > + struct dma_buf *buf; > + > + mutex_lock(&vpr->lock); [Severity: High] Could the uninitialized vpr->lock cause a system crash? The tegra_vpr structure is allocated using kzalloc_obj(), which zero-initializes the memory, but mutex_init(&vpr->lock) is never called. When this function attempts to acquire the lock, it interacts with an uninitialized mutex, which can trigger warnings or a kernel panic. > + > + buffer = tegra_vpr_buffer_allocate(vpr, size); [ ... ] > + buf = dma_buf_export(&export); > + if (IS_ERR(buf)) > + tegra_vpr_buffer_release(buffer); [Severity: Medium] Should the active VPR chunks be deactivated if dma_buf_export() fails? tegra_vpr_buffer_release() decrements the chunk use counts but does not deactivate the chunks. A missing call to tegra_vpr_recycle() could mean chunks successfully activated during this allocation attempt remain pinned in CMA indefinitely. > + > + mutex_unlock(&vpr->lock); > + return buf; > +} > + > +static void tegra_vpr_debugfs_show_buffers(struct tegra_vpr *vpr, > + struct seq_file *s) > +{ > + struct tegra_vpr_buffer *buffer; > + char buf[16]; > + > + list_for_each_entry(buffer, &vpr->buffers, list) { [Severity: Medium] Is it safe to traverse the shared vpr->buffers list without holding vpr->lock? Concurrent buffer allocations or releases could modify the list pointers mid-traversal, potentially causing a kernel panic when reading the debugfs node. > + string_get_size(buffer->size, 1, STRING_UNITS_2, buf, > + sizeof(buf)); > + seq_printf(s, " %pap-%pap (%s)\n", &buffer->start, > + &buffer->limit, buf); > + > + } > +} [ ... ] > +static int tegra_vpr_setup_chunks(struct tegra_vpr *vpr, const char *name) > +{ > + phys_addr_t start, limit; > + unsigned int order, i; > + size_t max_size; > + int err; > + > + /* This seems a reasonable value, so hard-code this for now. */ > + vpr->num_chunks = 4; > + > + vpr->chunks = kcalloc(vpr->num_chunks, sizeof(*vpr->chunks), > + GFP_KERNEL); > + if (!vpr->chunks) > + return -ENOMEM; > + > + max_size = PAGE_SIZE << (get_order(vpr->size) - ilog2(vpr->num_chunks)); [Severity: Medium] Could this bit shift result in a negative shift amount? If the device tree specifies a small VPR size, get_order(vpr->size) might be 0, and subtracting ilog2(vpr->num_chunks) (e.g., 2) would result in -2. Shifting PAGE_SIZE by a negative amount causes undefined behavior. > + order = get_order(vpr->align); [ ... ] > +static int __init tegra_vpr_add_heap(struct reserved_mem *rmem, > + struct device_node *np) > +{ > + struct dma_heap_export_info info = {}; > + unsigned long first, last; > + struct dma_heap *heap; > + struct tegra_vpr *vpr; > + int err; > + > + vpr = kzalloc_obj(*vpr, GFP_KERNEL); > + if (!vpr) > + return -ENOMEM; > + > + INIT_LIST_HEAD(&vpr->buffers); > + INIT_LIST_HEAD(&vpr->devices); > + vpr->resizable = !of_property_read_bool(np, "no-map"); > + vpr->use_freezer = true; > + vpr->dev_node = np; [Severity: Medium] Does this code leak the device node reference? When storing the child node pointer in vpr->dev_node, should it be acquiring a reference with of_node_get()? The scoped loop in tegra_vpr_init() will drop its reference when it finishes, which could leave a dangling pointer here. > + vpr->align = PAGE_SIZE; > + vpr->base = rmem->base; > + vpr->size = rmem->size; > + > + /* common setup */ > + vpr->start_page = phys_to_page(vpr->base); [Severity: High] Can phys_to_page() safely be used if the VPR device tree node specifies no-map? If the memory is removed from the direct mapping, there is no valid struct page array associated with it. Calling phys_to_page() here and later dereferencing these page pointers in tegra_vpr_attach() via sg_alloc_table_from_pages() could cause a system crash. > + vpr->num_pages = vpr->size >> PAGE_SHIFT; [ ... ] > +static int __init tegra_vpr_init(void) > +{ > + const char *compatible = "nvidia,tegra-video-protection-region"; > + struct device_node *parent; > + struct reserved_mem *rmem; > + int err; > + > + parent = of_find_node_by_path("/reserved-memory"); [Severity: Medium] Does this code leak the device node reference? of_find_node_by_path() increments the reference count for the /reserved-memory node, but there doesn't seem to be a matching of_node_put() call. > + if (!parent) > + return 0; [ ... ] > +static int tegra_vpr_device_init(struct reserved_mem *rmem, struct device > *dev) > +{ > + struct dma_heap *heap = rmem->priv; > + struct tegra_vpr *vpr = dma_heap_get_drvdata(heap); [Severity: High] Could a hardware device probing before the module initialization runs cause a NULL pointer dereference? tegra_vpr_rmem_ops is registered early via RESERVEDMEM_OF_DECLARE. If a device probes before tegra_vpr_init() executes, tegra_vpr_device_init() will be called while rmem->priv is still NULL, causing dma_heap_get_drvdata() to dereference a NULL pointer. > + struct tegra_vpr_device *node; > + > + if (!dev->driver->pm->freeze || !dev->driver->pm->thaw) [Severity: High] Can this unconditionally dereference a NULL pointer? If the device's driver does not provide PM operations (dev->driver->pm is NULL), this check will cause a kernel panic during device probe. > + return -EINVAL; > + > + node = kzalloc_obj(*node, GFP_KERNEL); > + if (!node) > + return -ENOMEM; > + > + INIT_LIST_HEAD(&node->node); > + node->dev = dev; > + > + list_add_tail(&node->node, &vpr->devices); > + > + return 0; > +} -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=8
