Always the elements are inserted according to greater than and less than. equal can't happen. The first value inserted will be the root to begin with, but if the tree gets heavy on one side, you rotate the root to rebalance. from any given node, you know that you will find the element you are looking for according to the node you are at, and its greater or less than test. You can know this is always going to be true because that's how the inserts will populate the tree, and the 'completeness' constraint enforces a requirement for the 'bunching up' of data. So long as there is space between two nodes, they can be as high up in the tree as the distance between them, each row downwards allows a power of two sequence of increasing numbers. The 'completeness' constraint is extended by not allowing orphans, every child has a parent, until you get to the root, so vice versa, you know that you will find every element in the tree by walking downwards and going left or right depending on greater than and less than, this is the comparability constraint. You should be able to see that a number of architectural features in this data storage system imply heuristics for optimisation.
Just to update, reading about B-heaps and how it structures storage, I realised that I need to think of the tree in terms of 4kb blocks (well, for most platforms) as this is the amount of memory that will be accessed in one operation. However many rows a payload size fits into 4kb is the size of each subtree. So for 32 bit values, I have one page at the top, and then 1024 pages that are the left and right pairs of the 512 elements at the bottom of the first page, and, of course, this repeats again. This changes how I have to implement the walk functions, as when it overflows past 512, I know I have moved to the second page-row. So... I have to kinda go back to the drawing board a little on the structuring of the algorithm since it is working on 4kb page sizes. I may have to switch up the design so that the pages are the primary indices of the first dimension, and then the pages themselves, the subtrees, can be allocated as fixed length arrays, which also simplifies how Go will deal with them. Further, instead of allocating whole rows at a time, only those rows that have been breached require allocation of a new page for a subtree. In this respect, then the question about search cost becomes a little different, since we use 4k pages, and we know walking them goes on inside 14Gb/s SRAM cache, performance optimisation by tree balancing has a somewhat different character than the Binary Heap linear mapping. Why do I seem so confident about the performance of this, potentially? Look up B-heaps and Varnish reverse proxy. It is designed at the low level with the same basic concept in mind, using B-heaps - structure the heap segments according to memory pages. If you are familiar with filesystems, you may know about block alignment. It's not quite so important for spinning disks but for flash and S/DRAM, memory is only ever handled on the hardware level in these block sizes. Disks usually use 512 byte blocks in the FS level and generally most disks have 512 byte blocks, some have 1k, 2k, 4k or 8k. Most current generation FS's use 4k blocks as the default size, for the reason it is a 1:1 mapping with memory pages. The difference with what I am designing is that it does not order vertically, it orders horizontally. It's basically fractal, each subtree has the same number of potential subtrees below it. Any code that keeps data aligned to memory page and disk page sizes is automatically significantly faster, because misalignment automatically doubles the amount of memory that has to be accessed to satisfy a request. This is why Binary Heaps are way slower than B-heaps. Anyway, because many of my assumptions based on my conceptual model have changed, and big thanks to you guys, I have to rework the design to account for this. Thanks :p On Wednesday, 25 April 2018 15:11:15 UTC+3, rog wrote: > > On 25 April 2018 at 10:24, Louki Sumirniy > <louki.sumir...@gmail.com <javascript:>> wrote: > > As you look deeper into the link discussing the B-heap you can see that > actually I am pretty much exactly following the same general structure in > my algorithm. The structure will align neatly with page boundaries and that > means less page faults and reduced pressure on the virtual memory mapping > system that means actually the tree can go a lot larger while cutting down > delays on retrieving data. I intuitively understood this in my own visual > model and this picture in particular is exactly how I visualised the data > storage map: > > To expand slightly on why this data structure isn't good for > searching, imagine you're searching for the number 40 in either of > those pictured data structures. When you're looking at the root, you > have to make a decision whether to move to the left or the right > child, but you can only see 2 and 3. How do you decide that you need > to go left? Now imagine you're searching for 31 - how do you decide > that the correct direction is right? > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.