HyperZealot edited a comment on issue #12997: A better take forward kernel for CPU URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-434862551 I don't think the workloads provided by @rongzha1 is suitable for determining the memory bandwidth, based on the following arguments: 1. For each input, the effective working set size is test_rows * num_cols * 4 bytes. Here's the working set size for each input: (1M, 20k, 512) = ~39MB (1M, 20k, 8) = ~0.6MB (800, 8, 61400) = ~1.90MB 2. the benchmark script runs 100 trials for each group of indices, so if the cache's size is greater than the working set size, then the source data can be totally loaded in cache after paying for compulsory misses during the 1st trial, then after 1st trial you're actually measuring the cache bandwidth. 3. Although the benchmark runs 100 trials for the same input, users usually on run once for each input so they always pay for the compulsory misses and the bottleneck is the memory speed. 4. With a bit of search I found Skylake 8180 has a 38.5MB L3 cache, so both (800, 8, 61400) and (1M, 20k, 8) workloads can totally be cached into the L3 cache, so the measurements performed on those cannot accurately compare the performance on increasing memory bandwidth consumption of different versions of code. 5. If you really want to showcase the effect of different versions on num_cols=8, maybe you can switch to a CPU with smaller cache or you can increase test_rows to make the working set larger than your L3 cache size. 6. I tested (50M, 1M, 8) (~30.5M working set size, definitely greater than my L3 cache size) on my own machine and got 7.90 GB/s for "for" version and 9.80 GB/s for "memcpy" version.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services