We add a PMU counter to expose the number of requests currently executing
on the GPU.

This is useful to analyze the overall load of the system.

  * Rebase.
  * Drop floating point constant. (Chris Wilson)

  * Change scale to 1024 for faster arithmetics. (Chris Wilson)

Do we want these separate in the final push? Is there value in reverting
one but not the others? They seem a triumvirate.

I think the only benefit to have them separate for me was that rebasing was marginally easier. I can just as well squash them if that is preferred.


