On Sun, Jul 28, 2019 at 02:28:18PM +0300, Oded Gabbay wrote: > This patch removes the limitation of a single process that can open the > device. > > Now, there is no limitation on the number of processes that can open the > device and have a valid FD. > > However, only a single process can perform compute operations. This is > enforced by allowing only a single process to have a compute context. > > Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> > --- > drivers/misc/habanalabs/context.c | 100 +++++++++++++++------ > drivers/misc/habanalabs/device.c | 18 ++-- > drivers/misc/habanalabs/habanalabs.h | 1 - > drivers/misc/habanalabs/habanalabs_drv.c | 8 -- > drivers/misc/habanalabs/habanalabs_ioctl.c | 7 +- > 5 files changed, 85 insertions(+), 49 deletions(-) > > diff --git a/drivers/misc/habanalabs/context.c > b/drivers/misc/habanalabs/context.c > index 57bbe59da9b6..f64220fc3a55 100644 > --- a/drivers/misc/habanalabs/context.c > +++ b/drivers/misc/habanalabs/context.c > @@ -56,7 +56,7 @@ void hl_ctx_do_release(struct kref *ref) > kfree(ctx); > } > > -int hl_ctx_create(struct hl_device *hdev, struct hl_fpriv *hpriv) > +static int hl_ctx_create(struct hl_device *hdev, struct hl_fpriv *hpriv) > { > struct hl_ctx_mgr *mgr = &hpriv->ctx_mgr; > struct hl_ctx *ctx; > @@ -89,9 +89,6 @@ int hl_ctx_create(struct hl_device *hdev, struct hl_fpriv > *hpriv) > /* TODO: remove for multiple contexts per process */ > hpriv->ctx = ctx; > > - /* TODO: remove the following line for multiple process support */ > - hdev->compute_ctx = ctx; > - > return 0; > > remove_from_idr: > @@ -206,13 +203,22 @@ bool hl_ctx_is_valid(struct hl_fpriv *hpriv, bool > requires_compute_ctx) > int rc; > > /* First thing, to minimize latency impact, check if context exists. > - * Also check if it matches the requirements. If so, exit immediately > + * This is relevant for the "steady state", where a process context > + * already exists, and we want to minimize the latency in command > + * submissions. In that case, we want to see if we can quickly exit > + * with a valid answer. > + * > + * If a context doesn't exists, we must grab the mutex. Otherwise, > + * there can be nasty races in case of multi-threaded application. > + * > + * So, if the context exists and we don't need a compute context, > + * that's fine. If it exists and the context we have is the compute > + * context, that's also fine. Other then that, we can't check anything > + * without the mutex. > */ > - if (hpriv->ctx) { > - if ((requires_compute_ctx) && (hdev->compute_ctx != hpriv->ctx)) > - return false; > + if ((hpriv->ctx) && ((!requires_compute_ctx) || > + (hdev->compute_ctx == hpriv->ctx))) > return true; > - } > > mutex_lock(&hdev->lazy_ctx_creation_lock); > > @@ -222,35 +228,73 @@ bool hl_ctx_is_valid(struct hl_fpriv *hpriv, bool > requires_compute_ctx) > * creation of a context > */ > if (hpriv->ctx) { > - if ((requires_compute_ctx) && (hdev->compute_ctx != hpriv->ctx)) > + if ((!requires_compute_ctx) || > + (hdev->compute_ctx == hpriv->ctx)) > + goto unlock_mutex; > + > + if (hdev->compute_ctx) { > valid = false; > - goto unlock_mutex; > - } > + goto unlock_mutex; > + } > > - /* If we already have a compute context, there is no point > - * of creating one in case we are called from ioctl that needs > - * a compute context > - */ > - if ((hdev->compute_ctx) && (requires_compute_ctx)) { > + /* If we reached here, it means we have a non-compute context, > + * but there is no compute context on the device. Therefore, > + * we can try to "upgrade" the existing context to a compute > + * context > + */ > + dev_dbg_ratelimited(hdev->dev, > + "Non-compute context %d exists\n", > + hpriv->ctx->asid); > + > + } else if ((hdev->compute_ctx) && (requires_compute_ctx)) { > + > + /* If we already have a compute context in the device, there is > + * no point of creating one in case we are called from ioctl > + * that needs a compute context > + */ > dev_err(hdev->dev, > "Can't create new compute context as one already > exists\n"); > valid = false; > goto unlock_mutex; > - } > + } else { > + /* If we reached here it is because there isn't a context for > + * the process AND there is no compute context or compute > + * context wasn't required. In any case, must create a context > + * for the process > + */ > > - rc = hl_ctx_create(hdev, hpriv); > - if (rc) { > - dev_err(hdev->dev, "Failed to create context %d\n", rc); > - valid = false; > - goto unlock_mutex; > + rc = hl_ctx_create(hdev, hpriv); > + if (rc) { > + dev_err(hdev->dev, "Failed to create context %d\n", rc); > + valid = false; > + goto unlock_mutex; > + } > + > + dev_dbg_ratelimited(hdev->dev, "Created context %d\n", > + hpriv->ctx->asid); > } > > - /* Device is IDLE at this point so it is legal to change PLLs. > - * There is no need to check anything because if the PLL is > - * already HIGH, the set function will return without doing > - * anything > + /* If we reached here then either we have a new context, or we can > + * upgrade a non-compute context to a compute context. Do the upgrade > + * only if the caller required a compute context > */ > - hl_device_set_frequency(hdev, PLL_HIGH); > + if (requires_compute_ctx) { > + WARN(hdev->compute_ctx, > + "Compute context exists but driver is setting a new > one");
This will trigger syzbot and will reboot machines that have 'panic-on-warn' set (i.e. all cloud systems). So be _VERY_ careful about this. If a user can trigger this, do not use WARN(), that's not what it is for. thanks, greg k-h