Re: [Dri-devel] Re: Ring lockups on mach64

Leif Delgass Tue, 11 Jun 2002 13:58:36 -0700

On Tue, 11 Jun 2002, Jos� Fonseca wrote:

> On 2002.06.11 20:31 Leif Delgass wrote:
> > On Tue, 11 Jun 2002, Jos� Fonseca wrote:
> > 
> > > On 2002.06.11 18:24 Leif Delgass wrote:
> > > > On Tue, 11 Jun 2002, Jos� Fonseca wrote:
> > > >
> > > > ...
> > > In that case we just need to ensure that the head >= the offset of the
> > > last ring submited.
> > 
> > But that means essentially waiting for idle.  The ring tail is moved
> > every
> > time we add a buffer to the ring, so the "last ring submitted" is always
> > close to the new tail and could be separated from the head by many
> > descriptors.  We just need to make sure that the card isn't going to
> > stop,
> > so the flush needs to restart DMA if the card is idle and ensure that it
> > _won't_ stop if the card is active.
> 
> I see. But it's very complicated to know whether the card is gonna to stop 
> or not. What we can do is have flag which can tell us if we are sure the 
> card will succed or not. When we add a new buffer to the ring and the 
> after the change the head is before the change then we binary "and" the 
> previous flag value with TRUE.


This is where we have to make sure that any assumptions we make can be
verified to be true.  I haven't done enough testing to really determine a
sure fire way of knowing that the card won't stop yet.  What I'm concerned
about is that the card might be doing some read-ahead buffering that we
don't know about.  That's why I was thinking we might have to see the card
actually advance a couple of times before determining it won't stop.  The
test I did with changing BM_GUI_TABLE from a buffer took a couple of
descriptors to take effect.
 
> When we want to do that kind of flush, we check the value of that flag. If 
> it's true we can release immediately. If it's not then we must 
> wait_for_idle because the card will probably stop, restart the DMA, and 
> return.
> 
> > 
> > > > A couple of other comments on your change:  I think we could just
> > >
> > > Although there are some things you can try simplify this is _very_
> > trick
> > > as you gonna see below...
> > >
> > > > unconditionally turn on busmastering in SRC_CNTL if the engine is
> > idle in
> > >
> > > No. The engine could be active but not doing BM, e.g., if X submited
> > > something to FIFO.
> > 
> > Notice that I said if the engine is _idle_.  The engine can't be active
> > if
> > it's idle, right?  ...
> 
> Right. Sorry. I didn't understood that the first I read this. I guess that 
> it would be safe to do what you had (*) suggested but as, you said too 
> below, using a variable it's probably a better idea - after all if it's in 
> the cache, the speed of access if several orders faster than doing 
> anything over the bus.
> 
> (*) Not true.. after seeing your diff I reminded of a problem with this: 
> note that if the card is not idle then you'll gonna offset the -4 entries 
> to correct it, but if the card was processing the X FIFO commands - that's 
> totally wrong and it's gonna corrupt the head offset. So I reiterate what 
> I said on my previous reply regarding this change. Nevertheless this is a 
> non-issue as using a variable achieves the same or better without 
> reordering the code.

I don't think it's a problem if the head_addr is one behind the actual
position if there are 2D commands still in the FIFO (which could only
happen at the final descriptor on the ring).  We don't actually act on it
until the card is idle.  It just means that the last buffer in the ring
won't be reclaimed until the card is idle.  Actually, if you _did_ advance
the head while the card is active, it would trigger the error check you
added to freelist_get because head would equal tail, but the card would
still be active.

> > ...
> > 
> > Actually, I realized there are two writes (one to SRC_CNTL and one to
> > BUS_CNTL), so reading SRC_CNTL could be better.  As you say, we could
> > keep
> > track of this with a variable, as long as we account for the possibility
> > of the card disabling bus mastering in the case of a lockup.
> > 
> > ...
> > >
> > > What's the point in waiting for anything? I though we both had agree
> > that
> > > we shouldn't wait for anything when submiting a buffer... (after all we
> > 
> > > both drawn the same conclusions in that matter in two distinct times).
> > 
> > As I mentioned above, we just need to ensure that the card won't stop
> > because it has already read an old end of list flag.  We could have added
> > several descriptors to the ring while the card is still active on an
> > earlier descriptor which the card sees as the end of the ring.  So
> > restarting the card if it's idle doesn't ensure that it won't stop if
> 
> I'm not sure if I understood correctly what you're saying. Note that once 
> we restart the card we can be sure that it won't stop until it finishes 
> _all_ buffers we supplied until _that_ moment.

I wasn't very clear here.  What I mean is that if the card is idle and we 
restart, we should be fine.  The problem is if we _only_ do that and 
do doing nothing if the card is active. 

> > it's active.  I was just thinking that waiting to see the card advance a
> > couple
> > of descriptors (but not waiting for idle) might be enough to ensure that
> > it isn't going to stop.
> 
> I think that having a flag indicating whether the card can stop or not is 
> more efficient. What do you think?

It depends on what's required to set a reliable flag.  That would have to
be done every time we advance the ring tail, whereas a flush ioctl is less
frequent.  We can remove the flush ioctls wherever they are followed by an
idle ioctl with the current version of the idle ioctl (since it ensures
_all_ buffers will complete), which would just leave the flush in DDFlush
in the Mesa driver.  If an app calls glFlush, it's probably not doing it 
very often (maybe once or twice a frame?).

> Another thing that we must address before this is using bigger buffers - 
> most of the ring dumps I see the buffers span just one or two entries at 
> maximum. There are two things: we are using a buffer for each DMA* macro 
> sets (which I'm working on now); and the client submited buffers can be 
> much bigger (they can even span several primitives). Until both this 
> things are addressed the the card is in idle most of the time anyway (at 
> least on my slow testbox).

The biggest problem with getting the client submitted buffers to be used
more efficiently is state emits.  Client-side state emits aren't secure.
The current code _does_ allow multiple primitives in a buffer as long as
there is no new state between them.  The AllocDmaLow will use an existing
vertex buffer until it's full or a state change causes a dispatch.
 
> > ...
> > >
> > > Yep. But after thinking better and it's probably better do all in
> > > UPDATE_RING... there isn't much point in putting 2 output instructions
> > in
> > > a seperate function where the call overhead takes more instructions
> > than
> > > that, especially if they are called only from a single function.
> > 
> > But doesn't inlining the function remove the overhead?  I was suggesting
> 
> Yes. If you inline you'll get the same thing... (I was being naughty.. :P )
> 
> > this mainly for readability, it's not really necessary if dma_start isn't
> > going to be used anywhere else.
> 
> > 
> > I'm attaching a diff so you can see what I'm talking about.  Hopefully
> > this will remove any confusion. :)
> > 
> 
> Yes it helped!
> 
> Besides my above comment I noticed that you did something with 
> RING_SPACE_TEST_WITH_RETURN, but it wasn't clear what from the diff. Is 
> that code still needed or not? I've disabled and I've been doing fine 
> without it.

Yeah, I meant to explain that.  I wasn't quite done there, I just have the
macro enabled to make sure it didn't cause problems to remove it.  The
macro you disabled was in the old path, where it _is_ necessary because
buffers aren't added to the ring until a flush.  This path is probably
broken now anyway.  In the new path, the only advantage I can see to
having it is to reduce the number of times you call wait_ring if the ring
is almost full.  Actually, the macro can also cause a new pass to be
started if the card is idle (via UPDATE_RING_HEAD -- this is what I meant
about the flush being 'hidden' in the UPDATE_RING_HEAD macro), so it might
cause less idle time in some cases.  It shouldn't be required for things
to work, it's more a question of whether things perform better with or
without it.

-- 
Leif Delgass 
http://www.retinalburn.net


_______________________________________________________________

Multimillion Dollar Computer Inventory
Live Webcast Auctions Thru Aug. 2002 - http://www.cowanalexander.com/calendar



_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [Dri-devel] Re: Ring lockups on mach64

Reply via email to