On Tue, Aug 26, 2025 at 12:34:27PM -0600, Cavitt, Jonathan wrote: > -----Original Message----- > From: Brost, Matthew <matthew.br...@intel.com> > Sent: Tuesday, August 26, 2025 11:04 AM > To: Cavitt, Jonathan <jonathan.cav...@intel.com> > Cc: intel-gfx@lists.freedesktop.org; Gupta, saurabhg > <saurabhg.gu...@intel.com>; Zuo, Alex <alex....@intel.com>; Harrison, John C > <john.c.harri...@intel.com> > Subject: Re: [PATCH 2/2] drm/xe/xe_vm: Add error injection support to lock > and prep > > > > On Tue, Aug 26, 2025 at 03:43:55PM +0000, Jonathan Cavitt wrote: > > > Add error injection support to the function > > > vm_bind_ioctl_ops_lock_and_prep. This necessitates marking the function > > > as noinline. > > > > > > Signed-off-by: Jonathan Cavitt <jonathan.cav...@intel.com> > > > Cc: Matthew Brost <matthew.br...@intel.com> > > > Cc: John Harrison <john.c.harri...@intel.com> > > > --- > > > drivers/gpu/drm/xe/xe_vm.c | 8 +++++--- > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c > > > index 1a8841116e40..e527c90b8c33 100644 > > > --- a/drivers/gpu/drm/xe/xe_vm.c > > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > > @@ -3201,9 +3201,10 @@ static int > > > vm_bind_ioctl_ops_prefetch_ranges(struct xe_vm *vm, struct xe_vma_ops > > > return 0; > > > } > > > > > > -static int vm_bind_ioctl_ops_lock_and_prep(struct drm_exec *exec, > > > - struct xe_vm *vm, > > > - struct xe_vma_ops *vops) > > > +static noinline int > > > > Ideally in [1] we'd have something like this: > > > > #ifdef CONFIG_FUNCTION_ERROR_INJECTION > > #define error_injectable noinline > > #else > > #define error_injectable > > #endif > > > > That might take sometime to get through, but in the meantine can we > > define something on the Xe side for this? > > > > [1] > > https://elixir.bootlin.com/linux/v6.16.3/source/include/asm-generic/error-injection.h > > In the short term, I think something like what's done with the > xe_is_injection_active function > in xe_guc_ct.c would work. Let me try applying that. >
I would just add 'error_injectable' define like I suggest to common xe header file. We might be able to drop xe_is_injection_active too but perhaps there is reason for injecting the error after setting the CT to broken. If there is a reason for this, then keeping xe_is_injection_active makes sense. > > > > > +vm_bind_ioctl_ops_lock_and_prep(struct drm_exec *exec, > > > + struct xe_vm *vm, > > > + struct xe_vma_ops *vops) > > > { > > > struct xe_vma_op *op; > > > int err; > > > @@ -3220,6 +3221,7 @@ static int vm_bind_ioctl_ops_lock_and_prep(struct > > > drm_exec *exec, > > > > > > return 0; > > > } > > > +ALLOW_ERROR_INJECTION(vm_bind_ioctl_ops_lock_and_prep, ERRNO); > > > > > > > We absolutely need the same injection points which are removed in patch > > #1 + an IGT with the same coverage. Please add similar points. VM binds > > They're already present. > No, see below. > > are a deep software pipeline that can be unwound at any point of > > failure. Different injection points trigger various unwind flows that > > need to be tested. It took me a long time to get this right, is easy to > > break, so we need good testing in place. > > AFAICT, the test xe_vm@bind-array-conflict-error-inject test only exercises > the vm_bind_ioctl_ops_lock_and_prep function. I can add additional points > to test in the test file if that's what is desired. grep for 'TEST_VM_OPS_ERROR', those are the injection points. Internally these points are iterated over on each injection. i.e., bind-array-conflict-error-inject runs a loop over each engine instance, we have 4 injection points, we more than 4 engine instances, so all 4 points will be triggered by bind-array-conflict-error-inject. We need similar coverage to this in the IGT (e.g., build a table of injection points, pick the next injection pointer each iteration. Matt > -Jonathan Cavitt > > > > > Matt > > > > > static void op_trace(struct xe_vma_op *op) > > > { > > > -- > > > 2.43.0 > > > > >