On Thu, 4 Apr 2019 08:41:29 -0700 Alyssa Rosenzweig <aly...@rosenzweig.io> wrote:
> > +/* > > + * Returns true if the 2 jobs have exactly the same perfcnt context, false > > + * otherwise. > > + */ > > +static bool panfrost_perfcnt_job_ctx_cmp(struct panfrost_perfcnt_job_ctx > > *a, > > + struct panfrost_perfcnt_job_ctx *b) > > +{ > > + unsigned int i, j; > > + > > + if (a->perfmon_count != b->perfmon_count) > > + return false; > > + > > + for (i = 0; i < a->perfmon_count; i++) { > > + for (j = 0; j < b->perfmon_count; j++) { > > + if (a->perfmons[i] == b->perfmons[j]) > > + break; > > + } > > + > > + if (j == b->perfmon_count) > > + return false; > > + } > > + > > Would using memcmp() be cleaner here? memcmp() does not account for the case where 2 jobs contain exactly the same perfmons but in a different order. This being said, it's rather unlikely to happen, so maybe we can accept the perf penalty for that case. > > > + if (panfrost_model_cmp(pfdev, 0x1000) >= 0) > > What does 0x1000 refer to here? I'm assuming maybe Bifrost, but it's not > obvious... probably better to have a #define somewhere and use that (or > an enum equivalently). Yes, all numbers above 0xfff are bifrost GPUs. I'll add a macro. > > > + /* > > + * Due to PRLAM-8186 we need to disable the Tiler before we enable HW > > + * counters. > > + */ > > What on earth is PRLAM-8186? :) > > Actually, wait, I can answer that -- old kbase versions had an errata > list: > > /* Write of PRFCNT_CONFIG_MODE_MANUAL to PRFCNT_CONFIG causes a > instrumentation dump if > PRFCNT_TILER_EN is enabled */ > BASE_HW_ISSUE_8186, > > So that's why. If people want, I'm considering moving these errata > descriptions back into the kernel where possible, since otherwise code > like this is opaque. Will copy the errata. > > > + unsigned int nl2c, ncores; > > + > > + /* > > + * TODO: define a macro to extract the number of l2 caches from > > + * mem_features. > > + */ > > + nl2c = ((pfdev->features.mem_features >> 8) & GENMASK(3, 0)) + > > 1; > > + > > + /* > > + * The ARM driver is grouping cores per core group and then > > + * only using the number of cores in group 0 to calculate the > > + * size. Not sure why this is done like that, but I guess > > + * shader_present will only show cores in the first group > > + * anyway. > > + */ > > + ncores = hweight64(pfdev->features.shader_present); > > + > > Deja vu. Was this copypaste dmaybe? Actually, that one is from me, hence the 'not sure why' part :). > > > + (panfrost_model_cmp(pfdev, 0x1000) >= 0 ? > > THere's that pesky 0x1000 again. > > > @@ -55,6 +63,15 @@ struct drm_panfrost_submit { > > > > /** A combination of PANFROST_JD_REQ_* */ > > __u32 requirements; > > + > > + /** Pointer to a u32 array of perfmons that should be attached to the > > job. */ > > + __u64 perfmon_handles; > > + > > + /** Number of perfmon handles passed in (size is that times 4). */ > > + __u32 perfmon_handle_count; > > + > > + /** Unused field, should be set to 0. */ > > + __u32 padding; > > Bleep blorp. If we're modifying _submit, we'll need to be swift about > merging this ahead of the main code to make sure we don't break the > UABI. Although I guess if we're just adding fields at the end, that's a > nonissue. Others are using the same "if data passed is smaller than expected size, unassigned fields are zeroed". That allows us to extend a struct without breaking the ABI as long as zero is a valid value and does not change the behavior compared to when the field was not present. This is the case here: perfmon_handle_count = 0 means no perfmon attached to the job, so the driver is acting like it previously was. No need to get that part merged in the initial patch series IMO. > > > +struct drm_panfrost_block_perfcounters { > > + /* > > + * For DRM_IOCTL_PANFROST_GET_PERFCNT_LAYOUT, encodes the available > > + * instances for a specific given block type. > > + * For DRM_IOCTL_PANFROST_CREATE_PERFMON, encodes the instances the > > + * user wants to monitor. > > + * Note: the bitmap might be sparse. > > + */ > > + __u64 instances; > > + > > + /* > > + * For DRM_IOCTL_PANFROST_GET_PERFCNT_LAYOUT, encodes the available > > + * counters attached to a specific block type. > > + * For DRM_IOCTL_PANFROST_CREATE_PERFMON, encodes the counters the user > > + * wants to monitor. > > + * Note: the bitmap might be sparse. > > + */ > > + __u64 counters; > > +}; > > I don't understand this. Aren't there more than 64 counters? > > > +struct drm_panfrost_get_perfcnt_layout { > > + struct drm_panfrost_block_perfcounters counters[PANFROST_NUM_BLOCKS]; > > +}; > > --Oh. It's per-block. Got it. > > > + * Used to create a performance monitor. Each perfmonance monitor is > > assigned an > > Typo. Will fix. > > --- > > Overall, this looks really great! Thank you! :) Thanks a lot for your reviews. That was pretty damn fast! _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel