Hi Peter, On 8/6/2018 3:12 PM, Peter Zijlstra wrote: > On Mon, Aug 06, 2018 at 12:50:50PM -0700, Reinette Chatre wrote: >> In my previous email I provided the details of the Cache Pseudo-Locking >> feature implemented on top of resctrl. Please let me know if you would >> like any more details about that. I can send you more materials. > > I've no yet had time to read.. > >> BUG: sleeping function called from invalid context at >> kernel/locking/mutex.c:748 >> >> I thus continued to use the API with interrupts enabled did the following: >> >> Two new event attributes: >> static struct perf_event_attr l2_miss_attr = { >> .type = PERF_TYPE_RAW, >> .config = (0x10ULL << 8) | 0xd1, > > Please use something like: > > X86_CONFIG(.event=0xd1, .umask=0x10), > > that's ever so much more readable. > >> .size = sizeof(struct perf_event_attr), >> .pinned = 1, >> .disabled = 1, >> .exclude_user = 1 >> }; >> >> static struct perf_event_attr l2_hit_attr = { >> .type = PERF_TYPE_RAW, >> .config = (0x2ULL << 8) | 0xd1, >> .size = sizeof(struct perf_event_attr), >> .pinned = 1, >> .disabled = 1, >> .exclude_user = 1 >> }; >> >> Create the two new events using these attributes: >> l2_miss_event = perf_event_create_kernel_counter(&l2_miss_attr, cpu, >> NULL, NULL, NULL); >> l2_hit_event = perf_event_create_kernel_counter(&l2_hit_attr, cpu, NULL, >> NULL, NULL); >> >> Take measurements: >> perf_event_enable(l2_miss_event); >> perf_event_enable(l2_hit_event); >> local_irq_disable(); >> /* Disable hardware prefetchers */ >> /* Loop through pseudo-locked memory */ >> /* Enable hardware prefetchers */ >> local_irq_enable(); >> perf_event_disable(l2_hit_event); >> perf_event_disable(l2_miss_event); >> >> Read results: >> l2_hits = perf_event_read_value(l2_hit_event, &enabled, &running); >> l2_miss = perf_event_read_value(l2_miss_event, &enabled, &running); >> /* Make results available in tracepoints */ > > switch to .disabled=0 and try this for measurement: > > local_irq_disable(); > perf_event_read_local(l2_miss_event, &miss_val1, NULL, NULL); > perf_event_read_local(l2_hit_event, &hit_val1, NULL, NULL); > /* do your thing */ > perf_event_read_local(l2_miss_event, &miss_val2, NULL, NULL); > perf_event_read_local(l2_hit_event, &hit_val2, NULL, NULL); > local_irq_enable();
Thank you very much for taking a look and providing your guidance. > > You're running this on the CPU you created the event for, right? Yes. I've modified your suggestion slightly in an attempt to gain accuracy. Now it looks like: local_irq_disable(); /* disable hw prefetchers */ /* init local vars to loop through pseudo-locked mem */ perf_event_read_local(l2_hit_event, &l2_hits_before, NULL, NULL); perf_event_read_local(l2_miss_event, &l2_miss_before, NULL, NULL); /* loop through pseudo-locked mem */ perf_event_read_local(l2_hit_event, &l2_hits_after, NULL, NULL); perf_event_read_local(l2_miss_event, &l2_miss_after, NULL, NULL); /* enable hw prefetchers */ local_irq_enable(); With the above I do not see the impact of an interference workload anymore but the results are not yet accurate: pseudo_lock_mea-538 [002] .... 113.296084: pseudo_lock_l2: hits=4103 miss=2 pseudo_lock_mea-541 [002] .... 114.349343: pseudo_lock_l2: hits=4102 miss=3 pseudo_lock_mea-544 [002] .... 115.410206: pseudo_lock_l2: hits=4101 miss=4 pseudo_lock_mea-551 [002] .... 116.473912: pseudo_lock_l2: hits=4102 miss=3 pseudo_lock_mea-554 [002] .... 117.532446: pseudo_lock_l2: hits=4100 miss=5 pseudo_lock_mea-557 [002] .... 118.591121: pseudo_lock_l2: hits=4103 miss=2 pseudo_lock_mea-560 [002] .... 119.642467: pseudo_lock_l2: hits=4102 miss=3 pseudo_lock_mea-563 [002] .... 120.698562: pseudo_lock_l2: hits=4102 miss=3 pseudo_lock_mea-566 [002] .... 121.769348: pseudo_lock_l2: hits=4105 miss=4 In an attempt to improve the accuracy of the above I modified it to the following: /* create the two events as before in "enabled" state */ l2_hit_pmcnum = l2_hit_event->hw.event_base_rdpmc; l2_miss_pmcnum = l2_miss_event->hw.event_base_rdpmc; local_irq_disable(); /* disable hw prefetchers */ /* init local vars to loop through pseudo-locked mem */ l2_hits_before = native_read_pmc(l2_hit_pmcnum); l2_miss_before = native_read_pmc(l2_miss_pmcnum); /* loop through pseudo-locked mem */ l2_hits_after = native_read_pmc(l2_hit_pmcnum); l2_miss_after = native_read_pmc(l2_miss_pmcnum); /* enable hw prefetchers */ local_irq_enable(); With the above I seem to get the same accuracy as before: pseudo_lock_mea-557 [002] .... 155.402566: pseudo_lock_l2: hits=4096 miss=0 pseudo_lock_mea-564 [002] .... 156.441299: pseudo_lock_l2: hits=4096 miss=0 pseudo_lock_mea-567 [002] .... 157.478605: pseudo_lock_l2: hits=4096 miss=0 pseudo_lock_mea-570 [002] .... 158.524054: pseudo_lock_l2: hits=4096 miss=0 pseudo_lock_mea-573 [002] .... 159.561853: pseudo_lock_l2: hits=4096 miss=0 pseudo_lock_mea-576 [002] .... 160.599758: pseudo_lock_l2: hits=4096 miss=0 pseudo_lock_mea-579 [002] .... 161.645553: pseudo_lock_l2: hits=4096 miss=0 pseudo_lock_mea-582 [002] .... 162.687593: pseudo_lock_l2: hits=4096 miss=0 Would a solution like this perhaps be acceptable to you? I will continue to do more testing searching for any caveats in this solution. >> With the above implementation and a 256KB pseudo-locked memory region I >> obtain the following results: >> pseudo_lock_mea-755 [002] .... 396.946953: pseudo_lock_l2: hits=4140 > >> The above results are not accurate since it does not reflect the success >> of the pseudo-locked region. Expected results are as we can currently >> obtain (copying results from previous email): >> pseudo_lock_mea-26090 [002] .... 61838.488027: pseudo_lock_l2: hits=4096 > > Still fairly close.. only like 44 extra hits or 1% error. While the results do seem close, reporting a cache miss on memory that is set up to be locked in cache is significant. Thank you very much for your patience Reinette