On Fri, 31 May 2002, José Fonseca wrote: > On 2002.05.31 02:53 Leif Delgass wrote: > > On Thu, 30 May 2002, José Fonseca wrote: > > > > I've fixed one bug already that was related to the ring tail being left > > pre-incremented (the table_end I had before wasn't). Some of my problems > > were related to not flushing on swaps (flickering on the quake menu), but > > this may not be a new problem. The bug that remains is that there are > > some rendering problems, seemingly random garbage polygons here and there > > in quake3.
I found and fixed this problem (I was using ring.tail to set the offset for buffer "aging" before it was incremented). > > Another thing I realized (not related to these changes) was a problem > > with > > moving a window leaving frames behind. We need to make sure DMA is > > quiescent in the XAA sync (flush before waiting for idle) so that there > > aren't frames left in the queue with old clipping coords. I've made this change. This problem was not as noticeable with the flush on swaps, but I've commented that out now. > > > > > - This code in the OUT_RING macro > > > > > > + if (idx > 3) idx = 0; \ > > > + if (idx == DMA_COMMAND) { \ > > > + last_cmd = (x) & ~DMA_EOL; \ > > > + last_cmd_ofs = write; \ > > > + } \ > > > + ring[write++] = cpu_to_le32( x ); \ > > > + idx++; \ > > > + write &= tail_mask; \ > > > > > > seems unnecessary and inefficient. We dont need to bookkeep idx, > > > last_cmd ans last_cmd_ofs when writing to the ring. We just need to > > make a > > > copy of the tail in the begining, and can get everything info we need > > from > > > that in COMMIT_RING. I've cleaned this up a bit and done the work in COMMIT_RING. > > > - I think mach64_flush_write_combine is excessive and creates an > > excessive > > > penalty as it calls mb(). We just need to make sure that everything we > > > write goes to the physical memory, i.e., call wb() - which in the x86 > > > platform is a null macro since there is no write cache. And this should > > be > > > only necessary in COMMIT_RING, as no-one else will read what we are > > > writing until toggle the continuation bit of the last descriptor. > > > > The above code and the mb()'s are just quick hacking and testing. I > > always assumed that flush_write_combine referred to the mtrr for the AGP > > aperture on x86, which is a kind of write cache, right? > > mmm.. I really don't know that much about mtrr. Documentation/mtrr.txt > says that it's indeed a write cache, but on the bus side, not > processor<->memory. So I don't see what mb(), which in case of the x86 > arch is equal to rmb(), which just ensures that read operations are > completed. The idea I have is that the memmory barriers are only good to > force ordering in read/write operations to memmory, not buses.. OK, I'm still not sure if I have this right, but I reduced this to a wmb() between setting the EOL flag on the new tail and clearing the EOL flag from the old tail. We need to ensure that the EOL flag on the new tail is in place first so there's no chance for the card to read past the end of our valid data. > > > - Why do you call mach64_do_dma_flush( dev_priv ) in COMMIT_RING? > > > > Again, that was just from my hacking/testing. OK, this is still there and the reason is that the flush (for this path) checks for idle and starts a new pass if there are descriptors waiting on the ring. There's still a problem here because a new pass is started even with just one descriptor in the ring. Eventually, one takes long enough to get a queue started, but this isn't optimal. I also added a flush (only for this path) in freelist_get before looking for buffers to reclaim (if the freelist is empty), since the card could be idle at that point. > > > ...but I did found something that annoyed the most: > > > > > > - I want to play with the nice stuff you've been making! Can you > > commit, > > > plz! Pretty please! ;-) > > > > OK, I'll try to do this soon. I was not feeling well today so I didn't > > get much done. > > No problem. I wish you get better soon. Thanks, I'm feeling better now. :) Now that the original path of flushing in batches is working again with the new code, I've committed everything. The new path is actually kind of working, but I still run into lockups and sometimes garbage being drawn. One of the annoying things is that BM_GUI_TABLE can be set to some seemingly random value if a lockup happens (are there other cases?), so I've added a sanity check in the flush function. Since we don't know where the card will stop, we have to restart it with whatever address is in BM_GUI_TABLE when it goes idle. I check to make sure it's actually within the address range of the ring. I've also added code to reset the ring structure and BM_GUI_TABLE in engine_reset to account for this. Some other notes of interest: the ring_wait is really not necessary, since 128 16kB buffers can use a max of 512 of the 1024 descriptors. We'd need 4MB of buffers to fill the ring. I suppose we could replace the RING_SPACE_CHECK macro with a check for enough buffers to start a new pass if the card is idle, then we probably wouldn't need the flush in freelist_get. At the moment, the performance gain is modest: less than 1 fps in gears, and with the quake3 timedemo (vertex lighting), I went from a previous best of 19.4 fps to 20.1 fps. I think the texture management code is still the main performance problem for games at the moment (though with the quake3 demo it mainly crops up only with lightmap lighting). -- Leif Delgass http://www.retinalburn.net _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel