On 25/11/14 22:05, Andy Ritger wrote:
On Tue, Nov 25, 2014 at 10:57:44AM -0500, Ilia Mirkin wrote:
On Mon, Nov 24, 2014 at 8:33 PM, Andy Ritger <[email protected]> wrote:
On Fri, Nov 21, 2014 at 01:39:55AM -0500, Ilia Mirkin wrote:
On Fri, Nov 21, 2014 at 1:16 AM, Andy Ritger <[email protected]> wrote:
Hi Ilia,

Actually 0x90b8 is different than copy engine.  I'm not very familiar
with it, but 0x90b8 is an engine for performing LZO decompression as
part of performing the copy.  It has a variety of limitations (e.g.,
cannot handle blocklinear format), and was only in a few Fermi chips,
as I understand it.

According to our driver source, GF100, GF104, GF110, GF114, and GF116
all have it. [So GF106, GF108, GF117, GF119 don't have it.] We've only
had problems reported against GF116... and only for some people.

Hmm, some of our internal documentation is inconsistent about whether it
applies to GF100, but otherwise what I see matches your list.  I guess
"few" was not entirely accurate.

It is probably easiest to just ignore it.  You can distinguish this
decompress engine from normal copy engine by looking at the CE capability
register on falcon (0x00000650).  If bit 2 is '1', then the falcon is
a decompress engine.

I presume you mean a +0x650 register on the pcopy engines (0x104000
and 0x105000). I only have access to the GF108 right now, which
returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at
0x104000 for copy on the GF108...

Yes, 0x104650 and 0x105650 are the right addresses, from what I can tell.

FWIW, the other capability bits are:
bit 0: "DMACOPY_SUPPORTED"
bit 1: "PIXREMAP_SUPPORTED"

(I think PIXREMAP_SUPPORTED is in reference to the component remapping
controlled by methods 0x00000700, 0x00000704, and 0x00000708 in the
copy engine class).

Neat. We went around and grabbed that 0x650 register on a bunch of
GPUs, see the CE* columns at:

http://envytools.readthedocs.org/en/latest/hw/gpu.html#fermi-kepler-maxwell-family

I don't see the 0x650 register values on that page.  Maybe I'm not
looking at the right place?

The table at the bottom, CE0-CE2 columns.


It looks like it's actually returning 0 on both "copy" engines for a
bunch of those cards -- GF100, GF104, GF114, probably GF110. But other
cards have them as either 3 or 4. I'm guessing that '0' should be
treated as if it were a '3' (or a '7')?

That's curious.  If I can get the table of where that reads zero, I can
try to investigate how to interpret that.

GF100, GF110, GF104, GF114.

Sounds obvious to me - the caps register wasn't needed before GF106 and thus didn't exist.

I don't think there's any more need for information here - we know how to tell apart a decompression engine by the caps register, *and* we know which cards have it (GF106, GF116, GF108 - unless someone resurrected it on GKsomething or GMsomething). We also know the difference between a normal copy engine and a decompression engine (basically: all dedicated copy hw is missing and replaced by dedicated decompression hw - effectively a completely different engine). In fact, given the decomp engine's simplicity, it shouldn't be hard at all to write firmware for it.

We are, however, quite curious about the purpose of an LZO1X decompression engine on a GPU...

Fun fact, I knew of the existence of decompression engines for some time, but never managed to locate them - I guess I didn't consider copy engines to warrant a second look on all possible GPUs...

Which brings me to ask: are there any more FIFO engines we somehow missed on Fermi+? There's apparently a new VIC class (0xa0b6), but I've never seen a VIC other than the MCP89 one (0x86b6).

AFAICS there's also one unknown enum value in NVRM's FIFO engine enum... (I know of GRAPH, CE0, CE1, CE2, VP1/VP2/MSPDEC, MSRCH/ME, MSPPP, BSP/MSVLD/MSDEC, MPEG, SOFTWARE, CIPHER/SEC, VIC, MSENC).


Curiously, a GF116 card that I thought was working fine on nouveau
actually has 3 for the first engine and 4 for the second. Perhaps it
just had enough VRAM that I never triggered the conditions required
for nouveau to use that second copy engine (we use it, when available,
for drm-initiated buffer moves).

Interesting.  Would that explain why this hasn't manifested on configs
other than the GF116 user reports?

Thanks,
- Andy

 From my admittedly limited understanding, both 0x104000 and 0x105000
appear to be falcon engines, where the fuc is presumably able to drive
some underlying hardware. The actual fifo methods are implemented in
the fuc, which in turn does iowr/etc commands.

Are you saying that the "decompress" engine (at 0x105000 right?) has a
different piece of hardware behind it than the copy engine at
0x104000, or does NVIDIA simply provide different fuc for it that
exposes somewhat different functionality via FIFO methods?

There is definitely a falcon at the frontend, and there is different
falcon ucode for "normal" copy engine versus the "decompress" engine.
But, I don't know off hand what dedicated hardware, if any, is behind it.

Seems likely that the HW is different, since it'd be madness to try to
do decompression in the falcon code itself. (Not to say that the ISA
isn't suited to it, just they have relatively slow clocks.) mwk is in
the process of working it all out.

   -ilia
_______________________________________________
Nouveau mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/nouveau


_______________________________________________
Nouveau mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/nouveau

Reply via email to