> > Any code that keeps data aligned to memory page and disk page sizes is > automatically significantly faster, because misalignment automatically > doubles the amount of memory that has to be accessed to satisfy a request. > This is why Binary Heaps are way slower than B-heaps.
My opinion is without measurement you will miss necessary knowledge found between assumptions. The package testing benchmark has non-obvious features for repeatable data: https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go I see you have logic tests (https://github.com/calibrae-project/bast/blob/master/pkg/bast/bast_test.go), and benchmarks are another step. Matt On Wednesday, April 25, 2018 at 9:37:48 AM UTC-5, Louki Sumirniy wrote: > > Always the elements are inserted according to greater than and less than. > equal can't happen. The first value inserted will be the root to begin > with, but if the tree gets heavy on one side, you rotate the root to > rebalance. from any given node, you know that you will find the element you > are looking for according to the node you are at, and its greater or less > than test. You can know this is always going to be true because that's how > the inserts will populate the tree, and the 'completeness' constraint > enforces a requirement for the 'bunching up' of data. So long as there is > space between two nodes, they can be as high up in the tree as the distance > between them, each row downwards allows a power of two sequence of > increasing numbers. The 'completeness' constraint is extended by not > allowing orphans, every child has a parent, until you get to the root, so > vice versa, you know that you will find every element in the tree by > walking downwards and going left or right depending on greater than and > less than, this is the comparability constraint. You should be able to see > that a number of architectural features in this data storage system imply > heuristics for optimisation. > > Just to update, reading about B-heaps and how it structures storage, I > realised that I need to think of the tree in terms of 4kb blocks (well, for > most platforms) as this is the amount of memory that will be accessed in > one operation. However many rows a payload size fits into 4kb is the size > of each subtree. So for 32 bit values, I have one page at the top, and then > 1024 pages that are the left and right pairs of the 512 elements at the > bottom of the first page, and, of course, this repeats again. This changes > how I have to implement the walk functions, as when it overflows past 512, > I know I have moved to the second page-row. > > So... I have to kinda go back to the drawing board a little on the > structuring of the algorithm since it is working on 4kb page sizes. I may > have to switch up the design so that the pages are the primary indices of > the first dimension, and then the pages themselves, the subtrees, can be > allocated as fixed length arrays, which also simplifies how Go will deal > with them. Further, instead of allocating whole rows at a time, only those > rows that have been breached require allocation of a new page for a > subtree. In this respect, then the question about search cost becomes a > little different, since we use 4k pages, and we know walking them goes on > inside 14Gb/s SRAM cache, performance optimisation by tree balancing has a > somewhat different character than the Binary Heap linear mapping. > > Why do I seem so confident about the performance of this, potentially? > Look up B-heaps and Varnish reverse proxy. It is designed at the low level > with the same basic concept in mind, using B-heaps - structure the heap > segments according to memory pages. If you are familiar with filesystems, > you may know about block alignment. It's not quite so important for > spinning disks but for flash and S/DRAM, memory is only ever handled on the > hardware level in these block sizes. Disks usually use 512 byte blocks in > the FS level and generally most disks have 512 byte blocks, some have 1k, > 2k, 4k or 8k. Most current generation FS's use 4k blocks as the default > size, for the reason it is a 1:1 mapping with memory pages. > > The difference with what I am designing is that it does not order > vertically, it orders horizontally. It's basically fractal, each subtree > has the same number of potential subtrees below it. > > Any code that keeps data aligned to memory page and disk page sizes is > automatically significantly faster, because misalignment automatically > doubles the amount of memory that has to be accessed to satisfy a request. > This is why Binary Heaps are way slower than B-heaps. > > Anyway, because many of my assumptions based on my conceptual model have > changed, and big thanks to you guys, I have to rework the design to account > for this. Thanks :p > > On Wednesday, 25 April 2018 15:11:15 UTC+3, rog wrote: >> >> On 25 April 2018 at 10:24, Louki Sumirniy >> <louki.sumir...@gmail.com> wrote: >> > As you look deeper into the link discussing the B-heap you can see that >> actually I am pretty much exactly following the same general structure in >> my algorithm. The structure will align neatly with page boundaries and that >> means less page faults and reduced pressure on the virtual memory mapping >> system that means actually the tree can go a lot larger while cutting down >> delays on retrieving data. I intuitively understood this in my own visual >> model and this picture in particular is exactly how I visualised the data >> storage map: >> >> To expand slightly on why this data structure isn't good for >> searching, imagine you're searching for the number 40 in either of >> those pictured data structures. When you're looking at the root, you >> have to make a decision whether to move to the left or the right >> child, but you can only see 2 and 3. How do you decide that you need >> to go left? Now imagine you're searching for 31 - how do you decide >> that the correct direction is right? >> > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.