i915: Start of GPU scheduler

Joonas Lahtinen Fri, 19 Feb 2016 02:14:18 -0800

Hi,

Adding Daniel as CC to comment below.


On to, 2016-02-18 at 14:22 +0000, John Harrison wrote:
> On 20/01/2016 13:18, Joonas Lahtinen wrote:
> > On Mon, 2016-01-11 at 18:42 +0000, [email protected] wrote:
> > > From: John Harrison <[email protected]>
> > > 

<SNIP>

> > > 
> > > +                 this = node->saved_objects + i;
> > > +
> > > +                 for (j = 0; j < test->num_objs; j++) {
> > > +                         that = test->saved_objects + j;
> > > +
> > > +                         if (this->obj != that->obj)
> > > +                                 continue;
> > How about VMAs? There might be multiple mappings to an object, isn't it
> > enough to depend on the required VMA instead of the whole object?
> The object is what we get coming in from user land through the IOCTL. So 
> why make things more complicated? If there are multiple VMAs referring 
> to the same object then we can't just track an individual VMA as that 
> would loose the dependency on all the other VMAs. Just because the 
> object is mapped to someone else's address space doesn't mean that this 
> batch buffer can't overwrite data they are reading.
> 

Right, makes sense.

> > > +int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry 
> > > *qe)
> > > +{
> > > + struct drm_i915_private *dev_priv = qe->params.dev->dev_private;
> > > + struct i915_scheduler   *scheduler = dev_priv->scheduler;
> > > + struct intel_engine_cs  *ring = qe->params.ring;
> > > + struct i915_scheduler_queue_entry  *node;
> > > + struct i915_scheduler_queue_entry  *test;
> > > + unsigned long       flags;
> > > + bool                not_flying;
> > > + int                 i, r;
> > > + int                 incomplete = 0;
> > > +
> > > + WARN_ON(!scheduler);
> > > +
> > This kind of situations should have a be a BUG_ON, because scheduler
> > being zero is literally going to cause an OOPS in the next dereference
> > which is going to happen unconditionally. WARN + OOPS is kind of what
> > BUG_ON should be used avoid. But this should be removed anyway after
> > scheduler is made a data member of dev_priv.
> 
> The WARNs were originally BUGs but Daniel Vetter had the opposite 
> demand. His view was the driver should never BUG under any 
> circumstances. A WARN followed by an oops is better than a BUG because 
> maybe it won't actually oops.
> 

WARN_ON is better than BUG_ON when there won't be an immediate OOPS.
But if you're doing a null pointer dereference like here if scheduler
is NULL, it is 100% sure it is going to either OOPS or cause horrible
undefined behaviour (on non-x86 platform).

Something like;

        if (WARN_ON(!a))
                return -ENODEV;

Could be what Daniel meant (if the error is propagated up and somehow
dealt with), but;

        WARN_ON(!a);
        a->foo = bar;

Should simply be BUG_ON(!a), because otherwise it'll just end up being
WARNING + OOPS right after each other.

> 
> > 
> > > + if (1/*i915.scheduler_override & i915_so_direct_submit*/) {
> > I assume this is going to be addressed in a future commit. Could have
> > been introduced in this patch, too.
> > 
> > > +         int ret;
> > > +
> > > +         scheduler->flags[qe->params.ring->id] |= i915_sf_submitting;
> > > +         ret = dev_priv->gt.execbuf_final(&qe->params);
> > > +         scheduler->flags[qe->params.ring->id] &= ~i915_sf_submitting;
> > > +
> > The kerneldoc should mention locking requirements of this function.
> > 
> > > +         /*
> > > +          * Don't do any clean up on failure because the caller will
> > > +          * do it all anyway.
> > > +          */
> > > +         if (ret)
> > > +                 return ret;
> > > +
> > > +         /* Free everything that is owned by the QE structure: */
> > > +         kfree(qe->params.cliprects);
> > > +         if (qe->params.dispatch_flags & I915_DISPATCH_SECURE)
> > > +                 
> > > i915_gem_execbuff_release_batch_obj(qe->params.batch_obj);
> > > +
> > > +         return 0;
> > Above piece of code looks like its own function, so it should probably
> > be one.
> > 
> > > + }
> > > +
> > > + node = kmalloc(sizeof(*node), GFP_KERNEL);
> > > + if (!node)
> > > +         return -ENOMEM;
> > > +
> > > + *node = *qe;
> > Any reason we can't simply move ownership of qe? If not, I'd rather
> > make a clone function
> 
> The qe pointer passed in is a reference to a stack local object in the 
> execbuff code path. Thus ownership cannot be transferred. Doing it this 
> way keeps the execbuff code nice and simple and all the dynamic memory 
> management and list tracking is self contained within the scheduler.
> 

I would indicate this with const qualifier in the function parameter.

> > > +
> > > +static int i915_scheduler_fly_node(struct i915_scheduler_queue_entry 
> > > *node)
> > > +{
> > > + struct drm_i915_private *dev_priv = node->params.dev->dev_private;
> > > + struct i915_scheduler   *scheduler = dev_priv->scheduler;
> > > + struct intel_engine_cs  *ring;
> > > +
> > > + WARN_ON(!scheduler);
> > > + WARN_ON(!node);
> > > + WARN_ON(node->status != i915_sqs_popped);
> > Other states had their I915_SQS_IS_* macro, why some don't?
> The purpose of the macro is to allow the combining of individual states 
> into classes. E.g. dead and complete can both be considered complete for 
> the majority of cases. Only in certain situations do you need to know 
> that it really was dead. Hence most places that don't really care just 
> use the merged macros, whereas places like this that do care use the 
> explicit enum value.
> 

Right, just asking cause having a macro might increase consistency.

Also here, we have WARN_ON(!node), and then next line dereferencing
node, which again is surely better off as BUG_ON.

Although it's not unreasonable to expect function to OOPS if you pass
null pointer. I think only the higher calling level should have
if(WARN_ON(!node)) return ... or BUG_ON construct, before calling this
function.

Only calls coming from userland need to be treated with such care that
anything can be passed, not internal functions. We're not having
BUG_ON(!dev) or BUG_ON(!dev_priv) around either.

> > > + /*
> > > +  * In the case where the system is idle, starting 'min_seqno' from a big
> > > +  * number will cause all nodes to be removed as they are now back to
> > > +  * being in-order. However, this will be a problem if the last one to
> > > +  * complete was actually out-of-order as the ring seqno value will be
> > > +  * lower than one or more completed buffers. Thus code looking for the
> > > +  * completion of said buffers will wait forever.
> > > +  * Instead, use the hardware seqno as the starting point. This means
> > > +  * that some buffers might be kept around even in a completely idle
> > > +  * system but it should guarantee that no-one ever gets confused when
> > > +  * waiting for buffer completion.
> > > +  */
> > > + min_seqno = ring->get_seqno(ring, true);
> > > +
> > > + list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
> > > +         if (I915_SQS_IS_QUEUED(node))
> > > +                 queued++;
> > > +         else if (I915_SQS_IS_FLYING(node))
> > > +                 flying++;
> > > +         else if (I915_SQS_IS_COMPLETE(node))
> > > +                 continue;
> > > +
> > > +         if (node->params.request->seqno == 0)
> > > +                 continue;
> > > +
> > > +         if (!i915_seqno_passed(node->params.request->seqno, min_seqno))
> > > +                 min_seqno = node->params.request->seqno;
> > > + }
> > Couldn't these values be kept cached, instead of counting them at each
> > function?
> The 'queued' and flying totals could be kept cached but min_seqno is 
> dependent upon the state of the hardware so needs to be recalculated. In 
> which case calculating the totals here is trivial and avoids having 
> extra code elsewhere to keep them up to date.
> 

Ok.

Btw, for_each_scheduler_node or such would be a great macro. Those are
very much preferred for readability.

$ fgrep for_each_ drivers/gpu/drm/i915/* | wc -l
525

> > 
> > > +
> > > + INIT_LIST_HEAD(&remove);
> > > + list_for_each_entry_safe(node, node_next, 
> > > &scheduler->node_queue[ring->id], link) {
> > > +         /*
> > > +          * Only remove completed nodes which have a lower seqno than
> > > +          * all pending nodes. While there is the possibility of the
> > > +          * ring's seqno counting backwards, all higher buffers must
> > > +          * be remembered so that the 'i915_seqno_passed()' test can
> > > +          * report that they have in fact passed.
> > > +          *
> > > +          * NB: This is not true for 'dead' nodes. The GPU reset causes
> > > +          * the software seqno to restart from its initial value. Thus
> > > +          * the dead nodes must be removed even though their seqno values
> > > +          * are potentially vastly greater than the current ring seqno.
> > > +          */
> > > +         if (!I915_SQS_IS_COMPLETE(node))
> > > +                 continue;
> > > +
> > > +         if (node->status != i915_sqs_dead) {
> > > +                 if (i915_seqno_passed(node->params.request->seqno, 
> > > min_seqno) &&
> > > +                     (node->params.request->seqno != min_seqno))
> > > +                         continue;
> > > +         }
> > > +
> > > +         list_del(&node->link);
> > > +         list_add(&node->link, &remove);
> > > +
> > > +         /* Strip the dependency info while the mutex is still locked */
> > > +         i915_scheduler_remove_dependent(scheduler, node);
> > > +
> > > +         continue;
> > > + }

<SNIP>

> > > + do {
> > > +         WARN_ON(!node);
> > > +         WARN_ON(node->params.ring != ring);
> > > +         WARN_ON(node->status != i915_sqs_popped);
> > > +         count++;
> > > +
> > > +         /*
> > > +          * The call to pop above will have removed the node from the
> > > +          * list. So add it back in and mark it as in flight.
> > > +          */
> > > +         i915_scheduler_fly_node(node);
> > Why do we want to pull an object out of the list inside spin lock and
> > push it back immediately in our critical code path? Seems like a waste
> > for no obvious gain at this point. Why do not we rather just select an
> > entry and modify it in-place, if it's going to stay in the same queue
> > anyway.
> The list order is significant. The element must be moved to the front to 
> keep the submitted items in submission order. Doing it this way also 
> keeps the code nicely partitioned and easier to understand/maintain. 
> Plus, there is a plan to optimise the code by splitting the one single 
> list into three separate ones - queued, flying, complete. If/when that 
> happens, the element will have to be removed from one list and added to 
> another.

Ok.

> 
> > 
> > > +
> > > +         scheduler->flags[ring->id] |= i915_sf_submitting;
> > > +         spin_unlock_irqrestore(&scheduler->lock, flags);
> > > +         ret = dev_priv->gt.execbuf_final(&node->params);
> > > +         spin_lock_irqsave(&scheduler->lock, flags);
> > > +         scheduler->flags[ring->id] &= ~i915_sf_submitting;
> > > +
> > > +         if (ret) {
> > > +                 int requeue = 1;
> > Multipurpose variable, not really a good idea. And as commented
> > further, should not exist at all.
> > 
> > > +
> > > +                 /*
> > > +                  * Oh dear! Either the node is broken or the ring is
> > > +                  * busy. So need to kill the node or requeue it and try
> > > +                  * again later as appropriate.
> > > +                  */
> > > +
> > > +                 switch (-ret) {
> > > +                 case ENODEV:
> > > +                 case ENOENT:
> > > +                         /* Fatal errors. Kill the node. */
> > > +                         requeue = -1;
> > > +                 break;
> > "break" indent is wrong.
> > 
> > > +
> > > +                 case EAGAIN:
> > > +                 case EBUSY:
> > > +                 case EIO:
> > > +                 case ENOMEM:
> > > +                 case ERESTARTSYS:
> > > +                 case EINTR:
> > > +                         /* Supposedly recoverable errors. */
> > > +                 break;
> > > +
> > > +                 default:
> > > +                         /*
> > > +                          * Assume the error is recoverable and hope
> > > +                          * for the best.
> > > +                          */
> > > +                         DRM_DEBUG_DRIVER("<%s> Got unexpected error 
> > > from execfinal(): %d!\n",
> > > +                                          ring->name, ret);
> > There's MISSING_CASE macro, should use it.
> > 
> > > +                 break;
> > > +                 }
> > > +
> > Just move the code below this point to the switch, no point having a
> > switch to categorize your options and then doing bunch of ifs to
> > execute code that could be in switch.
> One of the 'if' paths is to break out of the while loop. Can't do that 
> from inside the switch.

I do think the code could still be simplified, even if it involved a
carefully placed goto.

> 
> > > +                 /*
> > > +                  * Check that the watchdog/reset code has not nuked
> > > +                  * the node while we weren't looking:
> > > +                  */
> > > +                 if (node->status == i915_sqs_dead)
> > > +                         requeue = 0;

Btw, just noticed, should some locking of node occur? Is it handled
gracefully if right after this check, the status is changed, or do we
have a race condition here?

> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v4 07/38] drm/i915: Start of GPU scheduler

Reply via email to