Always the elements are inserted according to greater than and less than. 
equal can't happen. The first value inserted will be the root to begin 
with, but if the tree gets heavy on one side, you rotate the root to 
rebalance. from any given node, you know that you will find the element you 
are looking for according to the node you are at, and its greater or less 
than test. You can know this is always going to be true because that's how 
the inserts will populate the tree, and the 'completeness' constraint 
enforces a requirement for the 'bunching up' of data. So long as there is 
space between two nodes, they can be as high up in the tree as the distance 
between them, each row downwards allows a power of two sequence of 
increasing numbers. The 'completeness' constraint is extended by not 
allowing orphans, every child has a parent, until you get to the root, so 
vice versa, you know that you will find every element in the tree by 
walking downwards and going left or right depending on greater than and 
less than, this is the comparability constraint. You should be able to see 
that a number of architectural features in this data storage system imply 
heuristics for optimisation.

Just to update, reading about B-heaps and how it structures storage, I 
realised that I need to think of the tree in terms of 4kb blocks (well, for 
most platforms) as this is the amount of memory that will be accessed in 
one operation. However many rows a payload size fits into 4kb is the size 
of each subtree. So for 32 bit values, I have one page at the top, and then 
1024 pages that are the left and right pairs of the 512 elements at the 
bottom of the first page, and, of course, this repeats again. This changes 
how I have to implement the walk functions, as when it overflows past 512, 
I know I have moved to the second page-row.

So... I have to kinda go back to the drawing board a little on the 
structuring of the algorithm since it is working on 4kb page sizes. I may 
have to switch up the design so that the pages are the primary indices of 
the first dimension, and then the pages themselves, the subtrees, can be 
allocated as fixed length arrays, which also simplifies how Go will deal 
with them. Further, instead of allocating whole rows at a time, only those 
rows that have been breached require allocation of a new page for a 
subtree. In this respect, then the question about search cost becomes a 
little different, since we use 4k pages, and we know walking them goes on 
inside 14Gb/s SRAM cache, performance optimisation by tree balancing has a 
somewhat different character than the Binary Heap linear mapping.

Why do I seem so confident about the performance of this, potentially? Look 
up B-heaps and Varnish reverse proxy. It is designed at the low level with 
the same basic concept in mind, using B-heaps - structure the heap segments 
according to memory pages. If you are familiar with filesystems, you may 
know about block alignment. It's not quite so important for spinning disks 
but for flash and S/DRAM, memory is only ever handled on the hardware level 
in these block sizes. Disks usually use 512 byte blocks in the FS level and 
generally most disks have 512 byte blocks, some have 1k, 2k, 4k or 8k. Most 
current generation FS's use 4k blocks as the default size, for the reason 
it is a 1:1 mapping with memory pages.

The difference with what I am designing is that it does not order 
vertically, it orders horizontally. It's basically fractal, each subtree 
has the same number of potential subtrees below it.

Any code that keeps data aligned to memory page and disk page sizes is 
automatically significantly faster, because misalignment automatically 
doubles the amount of memory that has to be accessed to satisfy a request. 
This is why Binary Heaps are way slower than B-heaps.

Anyway, because many of my assumptions based on my conceptual model have 
changed, and big thanks to you guys, I have to rework the design to account 
for this. Thanks :p

On Wednesday, 25 April 2018 15:11:15 UTC+3, rog wrote:
>
> On 25 April 2018 at 10:24, Louki Sumirniy 
> <louki.sumir...@gmail.com <javascript:>> wrote: 
> > As you look deeper into the link discussing the B-heap you can see that 
> actually I am pretty much exactly following the same general structure in 
> my algorithm. The structure will align neatly with page boundaries and that 
> means less page faults and reduced pressure on the virtual memory mapping 
> system that means actually the tree can go a lot larger while cutting down 
> delays on retrieving data. I intuitively understood this in my own visual 
> model and this picture in particular is exactly how I visualised the data 
> storage map: 
>
> To expand slightly on why this data structure isn't good for 
> searching, imagine you're searching for the number 40 in either of 
> those pictured data structures. When you're looking at the root, you 
> have to make a decision whether to move to the left or the right 
> child, but you can only see 2 and 3. How do you decide that you need 
> to go left? Now imagine you're searching for 31 - how do you decide 
> that the correct direction is right? 
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to