On Tue, Aug 13, 2019 at 08:42:37PM +0000, Max Haughton via Digitalmars-d-learn wrote: > On Tuesday, 13 August 2019 at 18:54:58 UTC, H. S. Teoh wrote: [...] > > These days, with CPU multi-level caches and memory access > > predictors, in-place arrays are often the best option for > > performance, up to a certain size. In general, avoid indirection > > where possible, in order to avoid defeating the memory access > > predictor and reduce the number of cache misses. [...] > I saw a Bjarne Stroustrup talk where he benchmarked that the for n > > 1, std::vector was a lot faster than a linked list for all supported > operations. I don't know how clever the caching strategies are on a > modern processor (Pointer chasing), but to my knowledge the only way > of getting a cache efficient linked list would be to effectively have > a very contiguous allocator (Which obviously defeats the purpose of > using a list in the first place) > > Found it: https://www.youtube.com/watch?v=YQs6IC-vgmo
A contiguous allocator doesn't help after the list has undergone a large number of insertions/deletions, because of fragmentation. You can still get a cache-efficient linked list by chunking. I.e., instead of a linked-list of individual elements, group the elements into blocks (arrays) that are then linked into a linked list. Then you can still insert/delete elements with sub-O(1) complexity by updating only the affected block, and only occasionally you need to delete a block or allocate a new block. As long as the block size is suitably-sized, it will be cache-coherent most of the time and enjoy the associated performance boosts from the cache hierarchy. For small lists, it will be equivalent to using a single array. For very large lists, you'll also reap the cheap insert/delete flexibility of a linked list. To minimize space wastage from unused slots in the blocks, you could have a list rebalancing based on some desired utilization factor (e.g., if <50% of a block is used, redistribute elements among adjacent blocks). Redistribution should be pretty cache-coherent (except perhaps for some corner cases), because it mostly involves linear copying of elements from one block to another. T -- I am not young enough to know everything. -- Oscar Wilde