Re: [go-nuts] Type func binding and interfaces

Louki Sumirniy Tue, 24 Apr 2018 01:59:27 -0700


On Tuesday, 24 April 2018 11:15:34 UTC+3, rog wrote:
>
> On 23 April 2018 at 19:58, Louki Sumirniy 
> <louki.sumir...@gmail.com <javascript:>> wrote: 
>

> > I set many of those identifiers to export because I wanted to keep 
> separate 
> > derived libraries that replaced the storage types embedded into the main 
> > library. These limitations would not exist were the 'back store' over 
> the 
> > wire or on disk. They only exist for the storage that is most accessible 
> to 
> > the program, the main memory. 
>
> I don't understand this. If you're replacing the underlying storage 
> types, why would you need to export details of an algorithm that the 
> storage types shouldn't need to have anything to do with? 
>

Because if they can't be accessed outside of the scope of the file I am 
forced to add the alternative data types within the same file. This is a 
massive loss of flexibility (and means an importing application cannot 
implement its own variant). In my test/scratch/thingy, even within the same 
folder direct access was impossible because of file scoping, so I 
capitalised the fields, because if it can't be done even in the same 
folder, in a different pkg/name folder for another implementing library I 
won't be able to do this either

> > Regarding the use of zero as an nil, that is already accepted by 
> numerous 
> > built in types (error, for example), and in this case the reason why I 
> have 
> > put that into the *abstract* type is because the data I am writing this 
> > algorithm for is hashes, which are never zero, at least in practise, 
> over a 
> > period of decades, no hash functions produce zero out of any significant 
> > amount of known data. 
>
> Why does your algorithm need to know about the possible nil-ness of 
> the stored items? It doesn't use that property now, at any rate. 
>

Zero is the default signifier of an empty node. Zero acts as the sentinel 
for the search to detect when it has fallen off the edge of a branch. But 
zero isn't necessarily the 'empty node' value, an implementation using 
strings needs to compare to "", and struct and slice types would need to 
test for nil, and one may conceive of an application where the empty field 
sentinel actually has a value (and its initialisation would require filling 
the data that way, of course)

>
> > This idea that I should be hiding all the functions smells like OOP to 
> me. 
>
> This isn't OOP - it's just good code hygiene. By hiding implementation 
> details, you make the public interface clear and you prevent clients 
> from relying on internal implementation details that you might want to 
> change. 
>

How do I override the functions to implement the data abstraction if the 
overridden functions are not exported? Internal implementation details for 
this specific library will not change once I hit the beta milestone. This 
is not a complex application, it is just a data storage/search library. 
Somewhere along the line there will have to be some exported elements that 
can be passed to and fro and possibly some of these will be encapsulated 
like this. But encapsulation can be a big obstacle to reuse and derivation.

Specific points that clearly do not need to be visible need to be pointed 
out. If someone is *extending* this then to be honest, they should just be 
forking it anyway, rather than importing it. Also, I don't think that is as 
easy to do in Go as it would be in an OOP language, as I had to do very 
specific things in the base type struct spec in order to enable overriding. 
In an OOP language you can override until the cows come home, and this is 
the 'fragility' problem of OOP classes that Go spsecifically aims to 
eliminate by enforcing composition.

If a user of the library were to embed the structure in another type, they 
would get access to these private elements anyway, and then by doing this 
on an import rather than a fork, they would be depending on me not to 
change the way it works. Everything I have exposed is specifically exposed 
to allow an external library to simply reimplement the set of accessors. 
These accessors need to know the Cursor type also, as this is how the tree 
is walked.

If I was writing a much more complex library then clearly there needs to be 
exposure of only that which is interface, and the interface needs to be 
immutable until version bumps.

> > As it stands with how it has to be implemented in Go, the accessing of 
> these 
> > would-be-private-in-C++ scenario, is very cumbersome due to the need to 
> use 
> > type assertions. I have tinkered with this for educational value but I 
> don't 
> > want to have to deal with this in the actual implementation. A key 
> insight 
> > of Schneier's work is 'security by obscurity' is weak. Private functions 
> fit 
> > that maxim pretty well. 
>
> Private functions are not "security by obscurity". Private functions 
> aren't "obscure" (you can see them in the source perfectly well) but 
> they are "secure" because you cannot access them. 
>

This is nonsense because all I have to do is fork and expose and what 
exactly is the nature of the danger of me knowing the internals? I follow 
FP methodology fairly strictly whenever possible and I did not design this 
library with it in mind to multi-thread it and expose the matter of side 
effects and conflicting state changes. It's specifically designed to be 
single threaded.

> > I think that there is no real benefit in doing this, 
> > since after all, this is the source, and if you can be told you can't 
> access 
> > that thing that is 'private' then it's not terribly private in reality, 
> only 
> > for arbitrary philosophical reasons excluded from your programmatic 
> access 
> > without forking the code. 
>
> As above, doing this is considered good code hygiene. If you do decide 
> to fork the code and make something public, then it's obvious that 
> you're breaking the API boundaries. 
>

You are probably right about a few parts of this needing to be not 
exported. Specifically anything that cannot be extended, such as the tree 
balance functions, and the details they work with, the fill tracking array, 
and probably the depth and length values should be also not exported, as 
one thing that would totally screw up this algorithm is callers clobbering 
those state values, and triggering errors (I do test for all such 
conditions that would cause array bounds exceptions, though, and this would 
be the fault of the consumer). It's still a work in progress. I mean, I 
only just finally made a working implementation for the abstraction of the 
backing store yesterday. I'm not a professional OOP language programmer, or 
any kind of professional, this is a work of love and that I think will help 
me achieve goals for future projects, at least with a high likelihood for 
the next project on my agenda.

> > The search functions are already in there, it's a piece of cake, given 
> the 
> > ability to test for equality, lesser and greater. That was probably one 
> of 
> > the first things I implemented, because it's so simple. Nobody has ever 
> > implemented a binary search tree with a dense store like this before, 
> and so 
> > the insert and delete functions are not there yet. Just use your 
> imagination 
> > for a little bit, even with a 4 row *dense* tree and consider what 
> happens 
> > when I delete a node with children, let's say, at row 3, with two 
> children. 
> > Where do thtose children go? They are now orphans, as their parent has 
> been 
> > cut out of the tree. So, and critical to the design of BAST, is that it 
> be 
> > possible to perform short optimisations during delete/insert write 
> > operations that do not cost too much time, but at the same time, do not 
> > cause the search cost to become excessively spread out, and specifically 
> > with delete, you can't have orphans. They have to be pushed up and 
> inwards, 
> > it's not complicated, but it gets more complicated the more nodes are 
> > children of the severed node. 
> > 
> > It's not a model for tree structures that ANYONE is familiar with. 
>
> I think you overestimate the novelty of your algorithm. It sounds to 
> me like it bears significant resemblance to a "complete binary tree" 
> (see https://xlinux.nist.gov/dads/HTML/completeBinaryTree.html). 
>

A complete binary tree does not have a concept of empty nodes. 

It uses references to bind nodes together, thus the overhead of memory 
storage, and the randomness (usually insert order) in the array of nodes.

The random memory locations of the node data is precisely what this 
algorithm aims to eliminate, as well as reducing memory overhead. Walk 
operations only require very short 32 bit integer math functions to provide 
the location of nodes with various relations, mainly parent, child, and a 
lateral direction, that treats the tree as though it was a sparse sorted 
list (sparse because some elements will be empty, usually).

I was hoping you had put me onto something that resembled what I am doing 
but no. Sparse versus Dense is the conceptual polarity I am working from 
here, not Partly Empty versus Complete. The powers of two formulas I use 
allow me to derive relations between members of a 1 dimensional array that 
map to a complete binary tree, but it's not a complete binary tree because 
it has empty nodes, and these empty nodes are very important because they 
mark the momentary boundaries and constraints required for search, insert 
and delete (especially delete, where a subtree can be severed and become 
unreachable).

> > Unlike simple, stupid newbie mistakes, this is intentional. I am writing 
> this 
> > algorithm because it, by its architecture, reduces random access of 
> memory, 
> > and in this, it gains a benefit of reduced cache misses in the CPU and 
> thus 
> > will have, potentially, a serious performance advantage compared to any 
> > search-and-sort algorithm whether it be hashtables or vector reference 
> based 
> > binary search trees. 
>
> You keep on making references to the efficiency of this algorithm, but 
> I don't see any benchmarks or algorithm analysis that supports your 
> viewpoint here. 
>

Benchmarks would require a completed algorithm. I will absolutely be 
profiling it once it's done, in order to work out the comparative 
processing loads for each type of operation. I have a vague concept of the 
heuristics for optimisation operations, but I had been putting off filling 
these parts in because I still hadn't achieved the data type abstraction at 
that point. 

The general rules for these tree rotation operations, in the BAST idiom, 
are simple, and mostly depend on the lateral walk operations,  but need to 
know also when the directions are up and down in each sideways step. The 
root can be fairly easily displaced sideways from one side to the other to 
maintain left/right balance, and this operation can precede the insert and 
delete operations, but in some cases it may be better to stepwise displace 
sideways from an insert on an overweight side, but if this was always how 
it was done it would become a very expensive operation on a large tree.

At this point the *potential* for a performance benefit is fairly clear, 
but it can't be said with certainty how much and in what conditions there 
is a benefit, and indeed, when there might be a disadvantage. A lot of this 
hinges on the tree balancing operations. It is almost unquestionably true, 
however, that for searches, this structure will reduce memory access 
latency, and if it were implemented on a spinning disk, the benefit would 
be extraordinary (and this is something that I might be able to learn some 
things about with the trees used in filesystems). 

(which I am going to now, on that note, go and do a little bit of reading).

Thanks for your discussion about this... it's implicitly complimentary to 
have my ruminations actually read and considered.

>   cheers, 
>     rog. 
>
> > Yes, on the last point, it always uses power of two because it is 
> modelling 
> > a dense binary tree, hence the name 'Bifurcation Array'. 
> > Thus also why I 
> > haven't finished the insert/delete functions. You can't just spin things 
> > around like in a vector based BST. There is a cost in search linearity 
> for 
> > the differentials of path lengths from the root, but here's the thing - 
> if 
> > you want to dodge allocating a new row, you could instead find the value 
> > that you could push inwards to the other side, and trace a path until 
> there 
> > is an inner empty child that you can 'rotate' into. Most of the time 
> this 
> > will not exceed maybe 32kb of data, maybe at the most, most of the time, 
> for 
> > most sizes of trees you would be working with, no more than 1mb, and 
> guess 
> > what, that all fits within a CPU cache which means manipulating that 
> memory 
> > takes no wait time and data transfers at 4-6 times as fast as the front 
> side 
> > bus. 
> > 
> > For the same reason I have functions and variables that track the fill 
> of 
> > each row of the tree, as well as one at the top that totals the left and 
> > right. These values can be quickly updated with each insert and delete, 
> and 
> > can be used to feed heuristics that decide what probably will be the 
> best 
> > way to shuffle the tree around to maintain its low variance of minimum 
> and 
> > maximum search time. 
> > 
> > It's for realtime hash tables that rapidly change, it's for proof of 
> work 
> > algorithms that would otherwise entail such a high and mostly fixed cost 
> of 
> > sorting to restructure to enable fast searching, so that after each 
> > operation, the tree is in a fairly optimal state for searches, as well 
> as 
> > inserts and deletes. 
> > 
> > I might be talking in ways that more experienced programmers and CS 
> people 
> > may think to be naive, but I don't think it is naive to think that we 
> can 
> > make more hay out of the SRAM in CPU caches, especially for applications 
> > that are becoming extremely critical to our lives. Databases 
> specifically, 
> > but as a cryptocurrency miner, the issue of the onslaught of ASICs has 
> given 
> > me a personal and visceral motivation to try and find a way to eliminate 
> the 
> > differential between specialised hardware and commodity CPUs. It's an 
> > interesting fact that isn't really discussed much, that the architecture 
> of 
> > CPUs is precisely a database specific design. There has been some 
> progress 
> > in using GPUs to accelerate graph databases, which involve graphs (and 
> > vectors, and variant payload data types), but this is almost completely 
> > irrelevant to how SQL, NoSQL and K/V database (and DHT based) stores 
> work, 
> > and the greatest amount of interesting stuff happening these days in 
> > computer systems is all around distributed systems, an area that is 
> rarefied 
> > amongst the rarified field of CS itself. 
> > 
> > Sorry if I went off on a rant there, but I hope it explains why I am 
> posing 
> > questions and asking for advice from more experienced gophers about how 
> to 
> > do these things. The body of material available with the use of google 
> > searches came up with almost nothing, and nothing that directly 
> addressed my 
> > modelling and even reading closer at the code that appeared to implement 
> > this datatype agnosticism (within the constraint of comparability and an 
> nil 
> > value) so I wanted to engage people in a discussion. I would be very 
> pleased 
> > to learn that I have missed important things in the field that render my 
> > design useless. I first was studying OOP back in the early 90s, and it 
> > seemed like some kind of magic, but anyone who is reading this probably 
> > already agrees that OOP (and functional) are not panaceas and languages 
> like 
> > Go are addressing this issue, promoting correctness, maintainability, 
> and 
> > efficiency all at the same time. I have tangled with more OOP code than 
> I 
> > wish I ever had, and to disclose, I did quite a bit of BASIC and 
> Assembler 
> > coding in my youth and you never forget the thrill of discovering a 
> property 
> > in the numbers or the architecture that lets you shortcut to solve 
> > previously messy, thick and dense problems that faced programmers in the 
> > past. 
> > 
> > On Monday, 23 April 2018 20:06:08 UTC+3, rog wrote: 
> >> 
> >> On 22 April 2018 at 23:20, Louki Sumirniy 
> >> <louki.sumir...@gmail.com> wrote: 
> >> > I essentially am trying to find an effective method in Go, preferably 
> >> > not 
> >> > too wordy, that lets me create an abstract data type, a struct, and a 
> >> > set of 
> >> > functions that bind to a different data type, and that I can write, 
> >> > preferably not in too much code, a change that allows the data type 
> of 
> >> > the 
> >> > embedded data to be changed. It's basically kinda inheritance, but 
> after 
> >> > much fiddling I found a hackish sorta way that isn't *too* 
> boilerplate 
> >> > filled: 
> >> > 
> >> > type nullTester func(*Bast, uint32) bool 
> >> > 
> >> > type Bast struct { 
> >> >   ... 
> >> >   isNull    nullTester 
> >> >   ... 
> >> >  } 
> >> > 
> >> > func isNull(b *Bast, d uint32) bool { 
> >> >   return d == 0 
> >> > } 
> >> > 
> >> > func NewBast() (b *Bast) { 
> >> >   ... 
> >> >   b.isNull = isNull 
> >> >   ... 
> >> > } 
> >> > 
> >> > // IsNull - tests if a value in the tree is null 
> >> > func (b *Bast) IsNull(d uint32) bool { 
> >> >   return b.isNull(b, d) 
> >> > } 
> >> > 
> >> > 
> >> > Now, bear in mind I haven't shown all of the code. But there is a 
> slice 
> >> > array in the Bast struct, and I it is defined as an interface{} and 
> >> > isNull 
> >> > is one of a set of operators that have to be written to match the 
> type 
> >> > used 
> >> > in the slice store, this might be a bad example because it doesn't 
> >> > actually 
> >> > act on the interface typed slice, but the point here is just this: 
> >> > 
> >> > It does not appear to be possible to make the type specification from 
> >> > the 
> >> > top line match the function signature of the type-bound function in 
> the 
> >> > bottom of the code snippet. I haven't been able to find anything that 
> >> > shows 
> >> > that a func type can have a method binding. 
> >> > 
> >> > https://github.com/calibrae-project/bast/blob/master/pkg/bast/bast.go 
> is 
> >> > where my WiP lives. This slightly hacky solution seems sound to me, I 
> >> > just 
> >> > don't like to be forced to use workarounds like this. If a type 
> >> > signature 
> >> > cannot be written that matches a method, yet I can do it this way, I 
> >> > don't 
> >> > see what purpose this serves as far as any kind of correctness and 
> >> > bug-resistance issues go. I would have to deal with a lot more 
> potential 
> >> > bugs if I had to concretely implemennt this library for the sake of 1 
> >> > slice 
> >> > and 7 functions out of a much larger library that conceptually is 
> >> > intended 
> >> > to only deal with comparable, mainly numerical values anyway. 
> >> 
> >> I don't really understand why you think you want all those function 
> types. 
> >> Your code seems to mix concerns quite a bit, but interfaces in Go are 
> >> generally 
> >> closely focused on a particular thing. 
> >> 
> >> I don't understand anything about your algorithm, but ISTM you could 
> >> do something like this: 
> >> 
> >>     https://play.golang.org/p/wv0T8T8Ynns 
> >> 
> >> Some changes I made in doing that; 
> >> 
> >> - unexported a load of identifiers that really don't deserve to be 
> >> part of the public API 
> >> - made Cursor into a lightweight by-value type 
> >> - defined a Store interface that doesn't require knowledge of the 
> >> Cursor data type. 
> >> 
> >> A few other observations: 
> >> 
> >> - requiring a value type to know whether an item is null or not does 
> >> not seem great 
> >> when you're storing integers - who is to say that the zero uint32 is 
> >> not a valid thing to store? 
> >> 
> >> - requiring callers to know about AddRow (and hence how the tree is 
> >> organised) does 
> >> not seem ideal. 
> >> 
> >> - there's no way to search for or insert items, but I guess you'll be 
> >> doing that later. 
> >> 
> >> - it looks like your algorithm will always use a power-of-two multiple 
> >> of the element size, 
> >> which could end up very inefficient when the tree is large. 
> >> 
> >>   cheers, 
> >>     rog. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "golang-nuts" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to golang-nuts...@googlegroups.com <javascript:>. 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Type func binding and interfaces

Reply via email to