These documentation improvements are much welcome, here are a few comments from me.
Quoting [email protected] (2018-02-16 16:04:22) > +Intel GPU Basics > +---------------- > + > +An Intel GPU has multiple engines. There are several engine types. > + > +- RCS engine is for rendering 3D and performing compute, this is named > `I915_EXEC_DEFAULT` in user space. I'd call out I915_EXEC_RENDER existence here and introduce I915_EXEC_DEFAULT as its own line. > +- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user > space. > +- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in > user space > +- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user > space. > + > +The Intel GPU family is a familiy of integrated GPU's using Unified Memory > +Access. For having the GPU "do work", user space will feed the GPU batch > buffers > +via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER`, > `DRM_IOCTL_I915_GEM_EXECBUFFER2` > +or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will instruct > the I'd also call out DRM_IOCTL_I915_GEM_EXECBUFFER to be legacy submission method and primarily mention I915_GEM_EXECBUFFER2_WR. > +GPU to perform work (for example rendering) and that work needs memory from > +which to read and memory to which to write. All memory is encapsulated within > +GEM buffer objects (usually created with the ioctl > DRM_IOCTL_I915_GEM_CREATE). > +An ioctl providing a batchbuffer for the GPU to create will also list all GEM > +buffer objects that the batchbuffer reads and/or writes. > + In chronological order, maybe first introduce the hardware contexts? Only then go to PPGTT. > +The GPU has its own memory management and address space. The kernel driver > +maintains the memory translation table for the GPU. For older GPUs (i.e. > those > +before Gen8), there is a single global such translation table, a global > +Graphics Translation Table (GTT). For newer generation GPUs each hardware > +context has its own translation table, called Per-Process Graphics > Translation > +Table (PPGTT). Of important note, is that although PPGTT is named > per-process it > +is actually per hardware context. When user space submits a batchbuffer, the > kernel > +walks the list of GEM buffer objects used by the batchbuffer and guarantees > +that not only is the memory of each such GEM buffer object resident but it is > +also present in the (PP)GTT. If the GEM buffer object is not yet placed in > +the (PP)GTT, then it is given an address. Two consequences of this are: > +the kernel needs to edit the batchbuffer submitted to write the correct > +value of the GPU address when a GEM BO is assigned a GPU address and > +the kernel might evict a different GEM BO from the (PP)GTT to make address > +room for a GEM BO. > + > +Consequently, the ioctls submitting a batchbuffer for execution also include > +a list of all locations within buffers that refer to GPU-addresses so that > the > +kernel can edit the buffer correctly. This process is dubbed relocation. The > +ioctls allow user space to provide what the GPU address could be. If the > kernel > +sees that the address provided by user space is correct, then it skips > performing > +relocation for that GEM buffer object. In addition, the ioctl's provide to > what > +addresses the kernel relocates each GEM buffer object. > + > +There is also an interface for user space to directly specify the address > location > +of GEM BO's, the feature soft-pinning and made active within an execbuffer2 > +ioctl with EXEC_OBJECT_PINNED bit up. If user-space also specifies > I915_EXEC_NO_RELOC, > +then the kernel is to not execute any relocation and user-space manages the > address > +space for its PPGTT itself. The advantage of user space handling address > space is > +that then the kernel does far less work and user space can safely assume that > +GEM buffer object's location in GPU address space do not change. > + > +Starting in Gen6, Intel GPU's support hardware contexts. A GPU hardware > context > +represents GPU state that can be saved and restored. When user space uses a > hardware > +context, it does not need to restore the GPU state at the start of each > batchbuffer > +because the kernel directly the GPU to load the state from the hardware > context. > +Hardware contexts allow for much greater isolation between processes that > use the GPU. > + > +Batchbuffer Submission > +---------------------- > + > +Depending on GPU generation, the i915 kernel driver will submit batchbuffers > +in one of the several ways. However, the top code logic is shared for all > +methods. They key function, i915_gem_do_execbuffer() essentially converts > +the ioctl command to an internal data structure which is then added to a > queue > +which is processed elsewhere to give the job to the GPU; the details of > +i915_gem_do_execbuffer() are covered in `Common Code`_. > + > + > +Common Code > +~~~~~~~~~~~ > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c > + :doc: User command execution > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c > + :functions: i915_gem_do_execbuffer I'm not sure about referring to internal functions as they're bound to change often. No strong feeling on this, I just see this will be easy to miss when changing the related code. > + > +Batchbuffer Submission Varieties > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +As stated before, there are several varieties in how to submit batchbuffers > to the GPU; > +which one in use is controlled by function pointer values in the c-struct > intel_engine_cs > +(defined in drivers/gpu/drm/i915/intel_ringbuffer.h) > + > +- request_alloc > +- submit_request Same here. Due to the being here in a separate file, I'm not sure if this level of detail is going to be kept up when changing the actual code? > + > +The three varieties for submitting batchbuffer to the GPU are the following. > + > +1. Batchbuffers are subbmitted directly to a ring buffer; this is the most > basic way to submit batchbuffers to the GPU and is for generations strictly > before Gen8. When batchbuffers are submitted this older way, their contents > are checked via Batchbuffer Parsing, see `Batchbuffer Parsing`_. Just for editing and reading pleasure, there must be a way of cutting long lines in lists. But more importantly, do refer to Command Parser/Parsing as the code uses cmd parser aka. command parser extensively. Regards, Joonas _______________________________________________ Intel-gfx mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/intel-gfx
