On Sat, 01 Oct 2011, Bron Gondwana wrote: > On Sat, Oct 01, 2011 at 02:03:32PM -0300, Henrique de Moraes Holschuh wrote: > > On Sat, 01 Oct 2011, Bron Gondwana wrote: > > > Do you see a case where the alignment of the "const char *" for > > > value is significant? > > > > Yes, it is. How much depends on the arch, some will just be slowed down in > > some specific cases (x86-64), others will actually bus-fault, so the > > compiler will have to change an unaligned access into two aligned access > > plus magic which translates to even slower access to unaligned data. > > What I mean is, given this data: > > key = "abcdefghi", keylen = 9 > value = "the quick brown fox jumped over the lazy dog", keylen = 44 > > There are two possible on-disk representations (as bytes): > > "abcdefghithe quick brown fox jumped over the lazy dogXXX", len = 56 > "abcdefghiXXXXXXXthe quick brown fox jumped over the lazy dogXXXX", len = 64
I thought you meant the alignment for the the (char *), not for the char it references. If you're going to access "value" every time you access "abcdefghi", and you pack them together, you will get more of "value" hot in the cache, as the CPU will load into L3->L2->L1 a block of cacheline size (which is 64 BYTES on modern Intel Xeon, for example). So, the real question is: how much of "value" are you going to be accessing most of the time, and will the addition of the padding cause it to require a second cacheline fetch? Aligned *(char *) access _can_ be faster for string operations: you're not accessing *(char *), you are acessing a number of characters starting at (char *), and either the microcode or libc may well decide to optimize it into 32-bit or 64-bit compares when possible. So, if the padding will not cause it to spill into a new cacheline, do the padding. > > However, anything high-performance (x86-64 included) does memory access on > > cacheline-sized units anyway, so you should align data structures inside the > > mmap()'d file based not only on field-type alignment (to avoid giving the > > compiler even more reason to output crappy code), but also align groups of > > fields to cacheline boundaries. > > > > I'd recommend using 64 bytes as the cacheline size to optimize for. > > Every record will be 64 bit aligned - the only question is whether > there is value in aligning the value string inside that record. 64 *bytes* is a much better choice for record alignment, unless the record is smaller than 32 bytes. In that case, you'd align the record to 32 bytes, and align groups of records that are likely to be accessed in sequence at 64 bytes. This is valid for a processor with a cacheline size of 64 bytes. > > > Still only space for 32 bits worth of records, so we're limited at 4 > > > billion records per skiplist. I can live with that. > > > > If you have space left until the next record considering cacheline record > > alignment, maybe you could add an extra 32bits of _zero_ padding to that it > > is actually ready to be used as a 64 bits field... > > We could change the count field to be 64 bits actually, and put the version > at the end next to the CRC32. Would it be better for data locality? If it is not going to be worse, IMHO it is probably best to just go 64-bit already. > > Still, can you make a synthetic workload simulator for this? You'd be able > > to easily check the effect of various record and field padding strategies, > > as well as use hardware profilling to ask the CPU about cache misses, etc. > > I suspect any CPU issue It is the best way to know for sure, and if it is self-contained, you can easily request test runs on different CPUs. > > But _do_ make sure the important part of the file will align well with the > > cacheline boundaries, or performance will tank. > > Everything will align at 64 bit boundaries. That's the ideal alignment for field access, but you want one extra alignment: you want all fields that you're going to access close in time (or, if they're small enough, groups of records that you're going to access close in time) to be inside as few cachelines as possible. That means cacheline-size alignment is also important. Cachelines are large, a typical performance x86-64 has a cacheline size of 64 *bytes*, not bits. All memory access (DRAM, L3, L2 and L1) is done entire cachelines at a time. I'm sure you know all that, but since there was at least one 64-byte/64-bit confusion, I'd rather write it down for future reference if anyone reads this thread in the archives. > > What is the scenario you're trying to protect from? > > Basically a partial write leaving the DB in an inconsistent state which > a later load can't detect from. Here's the skiplist 1 version of the > SAFE_TO_APPEND function: ... > In skiplist2, there's no such distinction, we ALWAYS end with a COMMIT after > either creating the file, or repacking it - so we don't need to check > against logstart, and we don't special-case DELETE, so it's just: > > /* is this a complete file? */ > static int SAFE_TO_APPEND(struct db *db) > { > /* check it's bigger than the header */ > if (db->map_size <= HEADER_SIZE) > >-------return 1; > > /* and it must be multiple of 8 bytes */ > if (db->map_size % 8) > >-------return 1; > > /* and MUST end with a commit */ > if (*((uint64_t *)(db->map_base + db->map_size - 8)) != COMMIT) > >-------return 1; You might want to validate the checksum/CRC/hash/whatever of the last block, as well. Notice that if cacheline alignment is being used for the record, and the record fits inside a single cacheline, you will have the entire record already on L1 cache if you try to read _any_ of the record's fields, so checksumming/CRC-checking/hashing it can happen entirely from L1 cache. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh