These documentation improvements are much welcome, here are a few
comments from me.

Quoting (2018-02-16 16:04:22)
> +Intel GPU Basics
> +----------------
> +
> +An Intel GPU has multiple engines. There are several engine types.
> +
> +- RCS engine is for rendering 3D and performing compute, this is named 
> `I915_EXEC_DEFAULT` in user space.

I'd call out I915_EXEC_RENDER existence here and introduce I915_EXEC_DEFAULT as
its own line.

> +- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user 
> space.
> +- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in 
> user space
> +- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user 
> space.
> +
> +The Intel GPU family is a familiy of integrated GPU's using Unified Memory
> +Access. For having the GPU "do work", user space will feed the GPU batch 
> buffers
> +via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER`, 
> +or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will instruct 
> the

I'd also call out DRM_IOCTL_I915_GEM_EXECBUFFER to be legacy submission
method and primarily mention I915_GEM_EXECBUFFER2_WR.

> +GPU to perform work (for example rendering) and that work needs memory from
> +which to read and memory to which to write. All memory is encapsulated within
> +GEM buffer objects (usually created with the ioctl 
> +An ioctl providing a batchbuffer for the GPU to create will also list all GEM
> +buffer objects that the batchbuffer reads and/or writes.
> +

In chronological order, maybe first introduce the hardware contexts?
Only then go to PPGTT.

> +The GPU has its own memory management and address space. The kernel driver
> +maintains the memory translation table for the GPU. For older GPUs (i.e. 
> those
> +before Gen8), there is a single global such translation table, a global
> +Graphics Translation Table (GTT). For newer generation GPUs each hardware
> +context has its own translation table, called Per-Process Graphics 
> Translation
> +Table (PPGTT). Of important note, is that although PPGTT is named 
> per-process it
> +is actually per hardware context. When user space submits a batchbuffer, the 
> kernel
> +walks the list of GEM buffer objects used by the batchbuffer and guarantees
> +that not only is the memory of each such GEM buffer object resident but it is
> +also present in the (PP)GTT. If the GEM buffer object is not yet placed in
> +the (PP)GTT, then it is given an address. Two consequences of this are:
> +the kernel needs to edit the batchbuffer submitted to write the correct
> +value of the GPU address when a GEM BO is assigned a GPU address and
> +the kernel might evict a different GEM BO from the (PP)GTT to make address
> +room for a GEM BO.
> +
> +Consequently, the ioctls submitting a batchbuffer for execution also include
> +a list of all locations within buffers that refer to GPU-addresses so that 
> the
> +kernel can edit the buffer correctly. This process is dubbed relocation. The
> +ioctls allow user space to provide what the GPU address could be. If the 
> kernel
> +sees that the address provided by user space is correct, then it skips 
> performing
> +relocation for that GEM buffer object. In addition, the ioctl's provide to 
> what
> +addresses the kernel relocates each GEM buffer object.
> +
> +There is also an interface for user space to directly specify the address 
> location
> +of GEM BO's, the feature soft-pinning and made active within an execbuffer2
> +ioctl with EXEC_OBJECT_PINNED bit up. If user-space also specifies 
> +then the kernel is to not execute any relocation and user-space manages the 
> address
> +space for its PPGTT itself. The advantage of user space handling address 
> space is
> +that then the kernel does far less work and user space can safely assume that
> +GEM buffer object's location in GPU address space do not change.
> +
> +Starting in Gen6, Intel GPU's support hardware contexts. A GPU hardware 
> context
> +represents GPU state that can be saved and restored. When user space uses a 
> hardware
> +context, it does not need to restore the GPU state at the start of each 
> batchbuffer
> +because the kernel directly the GPU to load the state from the hardware 
> context.
> +Hardware contexts allow for much greater isolation between processes that 
> use the GPU.
> +
> +Batchbuffer Submission
> +----------------------
> +
> +Depending on GPU generation, the i915 kernel driver will submit batchbuffers
> +in one of the several ways. However, the top code logic is shared for all
> +methods. They key function, i915_gem_do_execbuffer() essentially converts
> +the ioctl command to an internal data structure which is then added to a 
> queue
> +which is processed elsewhere to give the job to the GPU; the details of
> +i915_gem_do_execbuffer() are covered in `Common Code`_.
> +
> +
> +Common Code
> +~~~~~~~~~~~
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +   :doc: User command execution
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +   :functions: i915_gem_do_execbuffer

I'm not sure about referring to internal functions as they're bound to
change often. No strong feeling on this, I just see this will be easy to
miss when changing the related code.

> +
> +Batchbuffer Submission Varieties 
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +As stated before, there are several varieties in how to submit batchbuffers 
> to the GPU;
> +which one in use is controlled by function pointer values in the c-struct 
> intel_engine_cs
> +(defined in drivers/gpu/drm/i915/intel_ringbuffer.h)
> +
> +- request_alloc
> +- submit_request

Same here. Due to the being here in a separate file, I'm not sure if this level
of detail is going to be kept up when changing the actual code?

> +
> +The three varieties for submitting batchbuffer to the GPU are the following.
> +
> +1. Batchbuffers are subbmitted directly to a ring buffer; this is the most 
> basic way to submit batchbuffers to the GPU and is for generations strictly 
> before Gen8. When batchbuffers are submitted this older way, their contents 
> are checked via Batchbuffer Parsing, see `Batchbuffer Parsing`_.

Just for editing and reading pleasure, there must be a way of cutting
long lines in lists.

But more importantly, do refer to Command Parser/Parsing as the code uses
cmd parser aka. command parser extensively.

Regards, Joonas
Intel-gfx mailing list

Reply via email to