Re: [Intel-gfx] [PATCH i-g-t v6] benchmarks/gem_wsim: Command submission workload simulator

Tvrtko Ursulin Tue, 25 Apr 2017 05:10:53 -0700


On 25/04/2017 12:35, Chris Wilson wrote:

On Tue, Apr 25, 2017 at 12:13:04PM +0100, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin <tvrtko.ursu...@intel.com>


Tool which emits batch buffers to engines with configurable
sequences, durations, contexts, dependencies and userspace waits.

Unfinished but shows promise so sending out for early feedback.

v2:
 * Load workload descriptors from files. (also -w)
 * Help text.
 * Calibration control if needed. (-t)
 * NORELOC | LUT to eb flags.
 * Added sample workload to wsim/workload1.

v3:
 * Multiple parallel different workloads (-w -w ...).
 * Multi-context workloads.
 * Variable (random) batch length.
 * Load balancing (round robin and queue depth estimation).
 * Workloads delays and explicit sync steps.
 * Workload frequency (period) control.

v4:
 * Fixed queue-depth estimation by creating separate batches
   per engine when qd load balancing is on.
 * Dropped separate -s cmd line option. It can turn itself on
   automatically when needed.
 * Keep a single status page and lie about the write hazard
   as suggested by Chris.
 * Use batch_start_offset for controlling the batch duration.
   (Chris)
 * Set status page object cache level. (Chris)
 * Moved workload description to a README.
 * Tidied example workloads.
 * Some other cleanups and refactorings.

v5:
 * Master and background workloads (-W / -w).
 * Single batch per step is enough even when balancing. (Chris)
 * Use hars_petruska_f54_1_random IGT functions and see to zero
   at start. (Chris)
 * Use WC cache domain when WC mapping. (Chris)
 * Keep seqnos 64-bytes apart in the status page. (Chris)
 * Add workload throttling and queue-depth throttling commands.
   (Chris)

v6:
 * Added two more workloads.
 * Merged RT balancer from Chris.

TODO list:


* No reloc!
* bb caching/reuse

Yeah I know, but have to progress the overall case as well and I amthinking it is getting close to good enough now. So now is the time tothink of interesting workloads, and workload combinations.

 * Fence support.
 * Better error handling.
 * Less 1980's workload parsing.
 * More workloads.
 * Threads?
 * ... ?

Signed-off-by: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
Cc: Chris Wilson <ch...@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozh...@intel.com>
---

+static enum intel_engine_id
+rt_balance(const struct workload_balancer *balancer,
+          struct workload *wrk, struct w_step *w)
+{
+       enum intel_engine_id engine;
+       long qd[NUM_ENGINES];
+       unsigned int n;
+
+       igt_assert(w->engine == VCS);
+
+       /* Estimate the "speed" of the most recent batch
+        *    (finish time - submit time)
+        * and use that as an approximate for the total remaining time for
+        * all batches on that engine. We try to keep the total remaining
+        * balanced between the engines.
+        */


Next steps for this would be to move from an instantaneous speed, to an
average. I'm thinking something like a exponential decay moving average
just to make the estimation more robust.

Do you think it would be OK to merge these two tools at this point andcontinue improving them in place?

Your balancer already looks a solid step up from the queue-depth one. Ichecked today myself and, what looks like a worst case of a VCS1 hog anda balancing workloads running together, it gets the VCS2 utilisation toimpressive 85%.


As mentioned before those stats can now be collected easily with:

  trace.pl --trace gem_wsim ...; perf script | trace.pl

I need to start pining the relevant people for help with creatingrelevant workloads and am also entertaining the idea of trying balancingvia exporting the stats from i915 directly. Just to see if true vsestimated numbers would make a difference here.

+                       if (qd_throttle > 0 && balancer && balancer->get_qd) {
+                               unsigned int target;
+
+                               for (target = wrk->nr_steps - 1; target > 0;
+                                    target--) {


I think this should skip other engines.

if (target->engine != engine)
        continue;

If you say so. I don't have an opinion on it. Would it be useful toperhaps have both options - to throttle globally and per-engine? I couldeasily add two different workload commands for that.


Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH i-g-t v6] benchmarks/gem_wsim: Command submission workload simulator

Reply via email to