Looking into the effects of false sharing when writing to memory mapped files. Intuitively, I would think that to avoid false sharing between multiple threads writing (no reader threads) to the mapped file at different offsets at the same time, writes to the file should be aligned and cache-line-padded.
I wrote a simple JMH benchmark to test this, where in one test, 5 threads (-t 5) are writing single longs to the file with no cache padding, using an AtomicLong to track the current index into the file. The other test still only writes singular longs, but adds 64 to the index counter on every iteration so that no two writes occur on the same cache line. The results of the test across multiple forks were the opposite of what I had expected, with the unpadded implementation performing a non-trivial amount better: Benchmark (blackhole) Mode Cnt Score Error Units AlignmentTest.testPadded 0 avgt 25 455.139 ± 59.933 ns/op AlignmentTest.testUnpadded 0 avgt 25 374.866 ± 46.613 ns/op AlignmentTest.testBlackHole 0 avgt 25 158.849 ± 20.971 ns/op Whats also interesting is that this trend holds true even when running with a single thread (-t 1), though I could reason that this is due to less spatial locality. Benchmark (blackhole) Mode Cnt Score Error Units AlignmentTest.testPadded 0 avgt 25 14.696 ± 0.008 ns/op AlignmentTest.testUnpadded 0 avgt 25 11.145 ± 0.278 ns/op AlignmentTest.testBlackHole 0 avgt 25 9.699 ± 0.008 ns/op (testBlackhole measure the time it takes to increment the index counter). Looking at the perf stats for both tests shows that the padded implementation performs better with regard to cache performance, as expected - In fact the only metric that is worse in the padded implementation is the CPI, which is nearly 2x that of the unpadded implementation. To the best that I can tell, the assembly between the two implementation is the same as well. Any ideas of other things that would cause padded writes to perform worse than writes which should cause tons of false sharing? This was tested on an Intel E5-2667 Haswell with Hotspot jdk 1.8_92. Benchmark source attached. -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
package org.kavanagh.benchmark; import java.io.File; import java.io.IOException; import java.io.RandomAccessFile; import java.nio.MappedByteBuffer; import java.nio.channels.FileChannel; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicLong; import org.agrona.BitUtil; import org.agrona.concurrent.AtomicBuffer; import org.agrona.concurrent.UnsafeBuffer; import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.annotations.BenchmarkMode; import org.openjdk.jmh.annotations.Fork; import org.openjdk.jmh.annotations.Measurement; import org.openjdk.jmh.annotations.Mode; import org.openjdk.jmh.annotations.OutputTimeUnit; import org.openjdk.jmh.annotations.Param; import org.openjdk.jmh.annotations.Scope; import org.openjdk.jmh.annotations.Setup; import org.openjdk.jmh.annotations.State; import org.openjdk.jmh.annotations.TearDown; import org.openjdk.jmh.annotations.Threads; import org.openjdk.jmh.annotations.Warmup; import org.openjdk.jmh.infra.Blackhole; @State(Scope.Benchmark) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Warmup(iterations = 5, time = 200, timeUnit = TimeUnit.MILLISECONDS) @Measurement(iterations = 30, time = 2, timeUnit = TimeUnit.SECONDS) @Threads(5) @Fork(5) public class AlignmentTest { private static final long FILE_SIZE = 1073741824L; static { System.setProperty(UnsafeBuffer.DISABLE_BOUNDS_CHECKS_PROP_NAME, "true"); } @Param({ "0", "5" }) public int blackhole; private AtomicBuffer buffer; private File tmpFile; private RandomAccessFile raf; private MappedByteBuffer bbuffer; private AtomicLong idx; @Setup public void init() throws IOException { tmpFile = new File("/dev/shm/test.tmp"); tmpFile.deleteOnExit(); raf = new RandomAccessFile(tmpFile, "rw"); FileChannel fileChannel = raf.getChannel(); bbuffer = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, FILE_SIZE); buffer = new UnsafeBuffer(bbuffer); idx = new AtomicLong(0); System.out.println("Agrona Bounds Check: " + UnsafeBuffer.SHOULD_BOUNDS_CHECK); } @TearDown public void close() throws IOException { raf.close(); tmpFile.delete(); } private final void sleep() { if (blackhole > 0) { Blackhole.consumeCPU(blackhole); } } @Benchmark public int testBlackHole() { sleep(); int offset = (int) idx.getAndAdd(BitUtil.CACHE_LINE_LENGTH); if (offset >= FILE_SIZE - BitUtil.CACHE_LINE_LENGTH) { idx.set(0); offset = 0; } return offset; } @Benchmark public int testUnpadded() { sleep(); // Rollover is fine for this test int offset = (int) idx.getAndAdd(8); if (offset >= FILE_SIZE - 8) { idx.set(0); offset = 0; } buffer.putLong(offset, 123L); return offset; } @Benchmark public int testPadded() { sleep(); // Rollover is fine for this test int offset = (int) idx.getAndAdd(BitUtil.CACHE_LINE_LENGTH); if (offset >= FILE_SIZE - BitUtil.CACHE_LINE_LENGTH) { idx.set(0); offset = 0; } buffer.putLong(offset, 123L); return offset; } }