On Mon, 21 May 2018 15:49:40 +0100
Jean-Philippe Brucker <[email protected]> wrote:
> On 18/05/18 19:04, Jacob Pan wrote:
> >> +struct iopf_context {
> >> + struct device *dev;
> >> + struct iommu_fault_event evt;
> >> + struct list_head head;
> >> +};
> >> +
> >> +struct iopf_group {
> >> + struct iopf_context last_fault;
> >> + struct list_head faults;
> >> + struct work_struct work;
> >> +};
> >> +
> > All the page requests in the group should belong to the same device,
> > perhaps struct device tracking should be in iopf_group instead of
> > iopf_context?
>
> Right, this is a leftover from when we kept a global list of partial
> faults. Since the list is now per-device, I can move the dev pointer
> (I think I should also rename iopf_context to iopf_fault, if that's
> all right)
>
> >> +static int iopf_complete(struct device *dev, struct
> >> iommu_fault_event *evt,
> >> + enum page_response_code status)
> >> +{
> >> + struct page_response_msg resp = {
> >> + .addr = evt->addr,
> >> + .pasid = evt->pasid,
> >> + .pasid_present = evt->pasid_valid,
> >> + .page_req_group_id =
> >> evt->page_req_group_id,
> >> + .private_data = evt->iommu_private,
> >> + .resp_code = status,
> >> + };
> >> +
> >> + return iommu_page_response(dev, &resp);
> >> +}
> >> +
> >> +static enum page_response_code
> >> +iopf_handle_single(struct iopf_context *fault)
> >> +{
> >> + /* TODO */
> >> + return -ENODEV;
> >> +}
> >> +
> >> +static void iopf_handle_group(struct work_struct *work)
> >> +{
> >> + struct iopf_group *group;
> >> + struct iopf_context *fault, *next;
> >> + enum page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
> >> +
> >> + group = container_of(work, struct iopf_group, work);
> >> +
> >> + list_for_each_entry_safe(fault, next, &group->faults,
> >> head) {
> >> + struct iommu_fault_event *evt = &fault->evt;
> >> + /*
> >> + * Errors are sticky: don't handle subsequent
> >> faults in the
> >> + * group if there is an error.
> >> + */
> > There might be performance benefit for certain device to continue in
> > spite of error in that the ATS retry may have less fault. Perhaps a
> > policy decision for later enhancement.
>
> Yes I think we should leave it to future work. My current reasoning is
> that subsequent requests are more likely to fail as well and reporting
> the error is more urgent, but we need real workloads to confirm this.
>
> >> + if (status == IOMMU_PAGE_RESP_SUCCESS)
> >> + status = iopf_handle_single(fault);
> >> +
> >> + if (!evt->last_req)
> >> + kfree(fault);
> >> + }
> >> +
> >> + iopf_complete(group->last_fault.dev,
> >> &group->last_fault.evt, status);
> >> + kfree(group);
> >> +}
> >> +
> >> +/**
> >> + * iommu_queue_iopf - IO Page Fault handler
> >> + * @evt: fault event
> >> + * @cookie: struct device, passed to
> >> iommu_register_device_fault_handler.
> >> + *
> >> + * Add a fault to the device workqueue, to be handled by mm.
> >> + */
> >> +int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie)
> >> +{
> >> + struct iopf_group *group;
> >> + struct iopf_context *fault, *next;
> >> + struct iopf_device_param *iopf_param;
> >> +
> >> + struct device *dev = cookie;
> >> + struct iommu_param *param = dev->iommu_param;
> >> +
> >> + if (WARN_ON(!mutex_is_locked(¶m->lock)))
> >> + return -EINVAL;
> >> +
> >> + if (evt->type != IOMMU_FAULT_PAGE_REQ)
> >> + /* Not a recoverable page fault */
> >> + return IOMMU_PAGE_RESP_CONTINUE;
> >> +
> >> + /*
> >> + * As long as we're holding param->lock, the queue can't
> >> be unlinked
> >> + * from the device and therefore cannot disappear.
> >> + */
> >> + iopf_param = param->iopf_param;
> >> + if (!iopf_param)
> >> + return -ENODEV;
> >> +
> >> + if (!evt->last_req) {
> >> + fault = kzalloc(sizeof(*fault), GFP_KERNEL);
> >> + if (!fault)
> >> + return -ENOMEM;
> >> +
> >> + fault->evt = *evt;
> >> + fault->dev = dev;
> >> +
> >> + /* Non-last request of a group. Postpone until the
> >> last one */
> >> + list_add(&fault->head, &iopf_param->partial);
> >> +
> >> + return IOMMU_PAGE_RESP_HANDLED;
> >> + }
> >> +
> >> + group = kzalloc(sizeof(*group), GFP_KERNEL);
> >> + if (!group)
> >> + return -ENOMEM;
> >> +
> >> + group->last_fault.evt = *evt;
> >> + group->last_fault.dev = dev;
> >> + INIT_LIST_HEAD(&group->faults);
> >> + list_add(&group->last_fault.head, &group->faults);
> >> + INIT_WORK(&group->work, iopf_handle_group);
> >> +
> >> + /* See if we have partial faults for this group */
> >> + list_for_each_entry_safe(fault, next,
> >> &iopf_param->partial, head) {
> >> + if (fault->evt.page_req_group_id ==
> >> evt->page_req_group_id)
> > If all 9 bit group index are used, there can be lots of PRGs. For
> > future enhancement, maybe we can have per group partial list ready
> > to go when LPIG comes in? I will be less searching.
>
> Yes, allocating the PRG from the start could be more efficient. I
> think it's slightly more complicated code so I'd rather see
> performance numbers before implementing it
>
> >> + /* Insert *before* the last fault */
> >> + list_move(&fault->head, &group->faults);
> >> + }
> >> +
> > If you already sorted the group list with last fault at the end,
> > why do you need a separate entry to track the last fault?
>
> Not sure I understand your question, sorry. Do you mean why the
> iopf_group.last_fault? Just to avoid one more kzalloc.
>
kind of :) what i thought was that why can't the last_fault naturally
be the last entry in your group fault list? then there is no need for
special treatment in terms of allocation of the last fault. just my
preference.
> >> +
> >> + queue->flush(queue->flush_arg, dev);
> >> +
> >> + /*
> >> + * No need to clear the partial list. All PRGs containing
> >> the PASID that
> >> + * needs to be decommissioned are whole (the device driver
> >> made sure of
> >> + * it before this function was called). They have been
> >> submitted to the
> >> + * queue by the above flush().
> >> + */
> > So you are saying device driver need to make sure LPIG PRQ is
> > submitted in the flush call above such that partial list is
> > cleared?
>
> Not exactly, it's the IOMMU driver that makes sure all LPIG in its
> queues are submitted by the above flush call. In more details the
> flow is:
>
> * Either device driver calls unbind()/sva_device_shutdown(), or the
> process exits.
> * If the device driver called, then it already told the device to stop
> using the PASID. Otherwise we use the mm_exit() callback to tell the
> device driver to stop using the PASID.
> * In either case, when receiving a stop request from the driver, the
> device sends the LPIGs to the IOMMU queue.
> * Then, the flush call above ensures that the IOMMU reports the LPIG
> with iommu_report_device_fault.
> * While submitting all LPIGs for this PASID to the work queue,
> ipof_queue_fault also picked up all partial faults, so the partial
> list is clean.
>
> Maybe I should improve this comment?
>
thanks for explaining. LPIG submission is done by device asynchronously
w.r.t. driver stopping/decommission PASID. so if we were to use this
flow on vt-d, which does not stall page request queue, then we should
use the iommu model specific flush() callback to ensure LPIG is
received? There could be queue full condition and retry. I am just
trying to understand how and where we can make sure LPIG is
received and all groups are whole.
> > what if
> > there are device failures where device needs to reset (either whole
> > function level or PASID level), should there still be a need to
> > clear the partial list for all associated PASID/group?
>
> I guess the simplest way out, when resetting the device, is for the
> device driver to call iommu_sva_device_shutdown(). Both the IOMMU
> driver's sva_device_shutdown() and remove_device() ops should call
> iopf_queue_remove_device(), which clears the partial list.
>
yes. I think that should work for functional reset.
> Thanks,
> Jean
[Jacob Pan]
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu