Re: [Intel-gfx] [RFC]: Arbitrated system memory bandwidth workarounds implementation for watermark.

Mahesh Kumar Mon, 03 Apr 2017 00:54:07 -0700

Hi Maarten,
sorry for delay in reply...

In Option 3:

We know maximum number of plane for any given CRTC, We also know, whatis the maximum downscaling supported (only downscaling affects WM) perpipe/plane.


    Maximum downscaling per plane can be :

max plane hscale * max plane vscale, which is 2.99x2.99in GEN9


    This scaling should also be less than cdclk / pixel clock.

    same limitation applies for pipe downscaling as well.

following patch implements limitation related to cdclk/pixel_clock(max supported pixel rate).


        https://patchwork.freedesktop.org/patch/141210/

So our downscaling related final limitation will be something like

min ( (max_plane_hscale * max_plane_vscale) *(max_pipe_hscale * max_pipe_vscale), (cdclk / pixel_clock))


            min (2.99*2.99*2.99*2.99, (cdclk / pixel_clock))

During modeset we can compute the same & enable the WA.

One of mem bandwidth limitation is, if Y_tile is enabled in any of theplane & total display bandwidth is > 20% then enable Y-tile specific WA,20% mark will hit only in case of DRAM connected is of lower frequencyOR high resolution & high refresh-rate monitors are connected.

for X-tile WA this % is 35% OR 60%, So we have pretty slim chances ofhitting the situation.

for e.g. 4K@60 display will have pixel clock about 540-545MHz, & cdclkwill be 594MHz

if 1600MHz dual-channel DRAM is connected to the system, then availablesystem bandwidth will be :


    1600 * 2 * 8 = 25600,

if 3 planes are enabled & all 3 pipes are enables in that case totaldisplay bandwidth requirement will be approx

545 * 3 * 3 = 4905, which is roughly 20% (19.16%) of totalavailable bandwidth, & y-tile WA maybe needed

if downscaling is enabled max supported downscaling will be (594 / 545)1.08%,


in such case max display bandwidth requirement may reach

545 * 1.08 * 3 * 3 = 5297.4, which is 20.69%, & Y-tile WA will beneeded.


for higher frequency DRAM this % will be even less

so whenever total bandwidth is going > 20% & Y-tile is enabled, thenonly we may need to take the mutex of all CRTC, so there will be fairlyless changes of holding any lock.


Regards,

-Mahesh

On Tuesday 28 March 2017 01:38 PM, Maarten Lankhorst wrote:

Op 27-03-17 om 17:52 schreef Mahesh Kumar:

*Arbitrated system bandwidth workarounds for watermark.*

All GEN-9 based platforms require watermark related WA to be enabledif Display memory bandwidth requirement is exceeding XX% of totalavailable system memory bandwidth.

This XX% depend on multiple factors.
*e.g.* if all the enabled planes have X-tiled or linear memory then,
                    XX = 60
        if any Y-tiled plane is enabled then
                    XX = 20 etc.

In current implementation of workarounds we enable maximum WA (i.e.add 15us latency during WM calculation) irrespective of workaround isrequired OR not.total display bandwidth requirement is sum of display requirement ofindividual pipe, In order to calculate correct BW requirement planeconfiguration of any pipe should not be changing during calculation.

To implement & optimize above requirement many implementations arepossible, I'm proposing few of options.

Please review & let know which option is better to implement WA's.

*Option 1:*

    Use connection_mutex (this will change to i915 specific lock only
    that is available in atomic design) to serialize all the commits.
    If memory bandwidth WA is changing then get all crtc_states for
    calculating watermark values.
    *Pros:*

      * In each flip optimum WM values (not more than the required
        value) will be used.

    *Cons:*

      * This approach will serialize all the flips so there will be
        performance impact, in case of blocking commits this impact
        will be even worse, e.g. three display with refresh-rate of
        30fps, 60fps & 90fps.
      * If commit is going-on in 30FPS display, all other flip will
        be blocked & frames in 60 & 90fps display will be
        dropped/blocked.

*Option 2:*

    Use two levels of system bandwidth check, once during calculation
    & second during commit.
    During intel_atomic_check (as part of compute_ddb) don’t hold any
    system level mutex, instead hold WM mutex & compute system
    bandwidth requirement. If WA is changing then get crtc_state of
    all other pipes & go  ahead with commit.
    During intel_atomic_commit, again take wm_mutex & recalculate
    complete system bandwidth requirement. If requirement is changed
    in a way that computed WM are not valid anymore fail the flip.
    Update the bandwidth requirement for each plane in global state
    (dev_priv->wm) so other flips don’t need to recalculate it.

    *Pros:*

      * It reduces critical section time.
      * Still optimum use of available DDB & optimum WM values are used.

    *Cons:*

      * If memory bandwidth WA are changing very frequently then
        there will be many flip failures which will impact the
        performance.


*Option 3:*

    Compute maximum bandwidth requirement during modeset.
    i.e. if modeset is of 1080p @60fps & maximum plane in CRTC are
    3,  with maximum supported downscale amount “XX.YY” (defined by
    min of cdclk/crtc_clock  & max(hscale x vscale)) then max
    bandwidth requirement for CRTC will be
    (1080p x 60 x 3 x XX.YY).

    Now during flip if there is any change which will change the WA
    (e.g. tiling change) then take wm_mutex lock & recalculate
    complete bandwidth requirement. If WA is changing then get
    crtc_state of all other pipes & go ahead with commit. (if total
    display memory BW % is  less than lowest % to enable WA i.e. 20%,
    then no need to recompute)
    Update per-CRTC bandwidth requirement in global state so other
    flips don’t need to recalculate each time.

    *Pros:*

      * All CRTC can flip independently until there is change which
        will impact WA.
      * No locking until potential WM WA change.

    *Cons:*

      * If memory bandwidth WA is changing very frequently then there
        will be slight performance impact.
      * We may not be programming optimum WM values, which may have
        some power impact.

If you think any other approach should be used please let know thatas well.

Option 4:
        Check if watermarks for the current pipe needs global adjustment 
between last commit and current, if not do nothing.

        If there is, we could do 1 of the below:
                1. Blindly grab all other crtc state and do watermark 
reprogramming.
                2. If it does need adjustment, grab all other crtc's mutexes 
and see if we need to adjust watermark state. If we do, grab other affected 
crtc's states as well to perform watermark reprogramming.

        Perhaps add some elements of option 3 too? I like that one too.

~Maarten

_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [RFC]: Arbitrated system memory bandwidth workarounds implementation for watermark.

Reply via email to