Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider: - [High] Concurrent MMU notifiers fail to queue partial unmap range updates due to list poisoning, causing unmapped regions to retain stale attributes. - [High] Disjoint partial unmaps are silently ignored during coalescing, preventing their VMA memory attributes from being reset. --
commit 63782c32791ae66c7e1c022f8db4a4b168c0d897 Author: Himal Prasad Ghimiray <[email protected]> drm/xe/svm: Correct memory attribute reset for partial unmap When performing a partial unmap of an SVM range, the memory attributes were being reset for the entire range instead of just the portion being unmapped. This could lead to unintended side effects and behaviour. > diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c > --- a/drivers/gpu/drm/xe/xe_svm.c > +++ b/drivers/gpu/drm/xe/xe_svm.c > @@ -127,6 +129,34 @@ static void xe_svm_range_free(struct drm_gpusvm_range > *range) [ ... ] > +static void xe_svm_range_set_unmapped(struct xe_svm_range *range, > + const struct mmu_notifier_range > *mmu_range) > +{ > + u64 new_start = max_t(u64, xe_svm_range_start(range), mmu_range->start); > + u64 new_end = min_t(u64, xe_svm_range_end(range), mmu_range->end); > + > + drm_gpusvm_range_set_unmapped(&range->base, mmu_range); > + if (range->base.pages.flags.partial_unmap) { > + if (range->partial_unmap.start || range->partial_unmap.end) { > + /* > + * Coalesce overlapping or adjacent intervals. > + * Disjoint ranges cannot be tracked in one interval > + * without covering still-mapped memory. > + */ > + if (new_start <= range->partial_unmap.end && > + new_end >= range->partial_unmap.start) { > + range->partial_unmap.start = > + min(range->partial_unmap.start, > new_start); > + range->partial_unmap.end = > + max(range->partial_unmap.end, new_end); > + } [Severity: High] If the new partial unmap is disjoint from the tracked interval, the overlap condition evaluates to false. Because there is no else block here, does this mean the disjoint interval is silently discarded? If so, the garbage collector might only reset the VMA attributes for the initial subrange, leaving the disjoint unmapped region with stale memory attributes. > + } else { > + range->partial_unmap.start = new_start; > + range->partial_unmap.end = new_end; > + } > + } > +} [ ... ] > @@ -135,12 +165,14 @@ xe_svm_garbage_collector_add_range(struct xe_vm *vm, > struct xe_svm_range *range, > > range_debug(range, "GARBAGE COLLECTOR ADD"); > > - drm_gpusvm_range_set_unmapped(&range->base, mmu_range); > - > spin_lock(&vm->svm.garbage_collector.lock); > + > + xe_svm_range_set_unmapped(range, mmu_range); > + > if (list_empty(&range->garbage_collector_link)) > list_add_tail(&range->garbage_collector_link, > &vm->svm.garbage_collector.range_list); > + > spin_unlock(&vm->svm.garbage_collector.lock); [Severity: High] Is it possible to lose concurrent MMU notifier updates here due to list poisoning? In xe_svm_garbage_collector(), the worker thread removes the item using list_del(), which poisons the list node pointers, and then the lock is dropped: xe_svm_garbage_collector() { ... list_del(&range->garbage_collector_link); spin_unlock(&vm->svm.garbage_collector.lock); ... } If a concurrent MMU notifier fires for the same range during this unlocked window, xe_svm_garbage_collector_add_range() acquires the lock, updates the partial unmap bounds, and evaluates the list_empty() condition above. Because the pointers are poisoned, list_empty() evaluates to false, and the range is not re-queued. Would this cause the VMA attributes for this concurrent unmap to never be reset? Note that switching to list_del_init() would allow it to be re-queued, but could that introduce a use-after-free since the worker then destroys the range? -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=7
