I have been struggling with getting the most optimal, non-reflective and
performant way of implementing something that is a bit like OOP method
overloading. As some here may know, I am implementing a novel binary tree
based on complete trees and using ideas from B-heaps in order to achieve
maximum data cache locality in a search and sort algorithm.
Basically I want to avoid rewriting any code. Note that this is not really
at all an OOP paradigm that I am trying to ape with this. I want to isolate
a small set of type specific functions from the much larger general
functions that deal with tree walking, inserting and deleting. The consumer
of the code needs to see something that looks like a type specific library,
and not have to concern themselves with any type assertions or anything
that complicates their code so that in every way the calling code will look
the same except for the type of the payload being different and I am not
excessively concerned with a small amount of call overhead, as it is my
belief that the data locality structure performance will far exceed the
small extra cost for this encapsulation. If I am wrong, I suppose I can
rework it using a code generator to spit out type specific code, but I
don't want to do this at this stage because it complicates my workflow in
completing the proof of concept.
So, I'll just briefly explain what I have worked out. Note that I am still
really a beginner with Go coding, and my reason for posting this is to get
a little more feedback in case there is any other matters I have not
accounted for in my construction.
I have the base type that concerns itself with the walk, search, insert and
delete functions in an interface. Some of the functions deal solely with
the tree node payload type, and some also require as well tree walk
functions that are aware only of the abstract characteristics of the
concrete type, but have a small contact surface with the data type for the
reason of the page structure as well as the use of zero as the empty node.
(This theoretically could be changed in an implementation where zero is a
valid value but at the same time in a binary tree, implicitly zero is the
left-edge node anyway and in a binary tree we are trying to keep a sort
structure so there is only one place a valid null would ever be anyway)
What I have done is create a type struct that gathers all of these
functions that touch the concrete type, and a set of slightly differently
named functions from the ones the caller will use, that also need to be
passed the structure gathering the type specific functions and the walk and
search/insert delete functions that implicitly must be type aware but only
of the abstract features of the data, greater, less, equal and empty.
The concrete typed library then contains the type specific functions, these
are basically comparators, and then the end-user functions are written that
call functions in the base library that requires access to these type
specific functions, these are the walk functions (the walk to parent
function does not need to interact with the concrete type, it only needs to
understand the geometry of the data structure), but the left/right child
walks need to be aware of the type as in the structure zero is my
edge/empty sentinel. Then finally the maybe unique lateral walk functions
which use the left/right walk functions, these functions need to use the
simple left/right child walks but also need to know the boundaries of the
data structure as the allocation of pages in the tree needs to be sparse
in order to minimise memory utilisation at the bottom edges. It is a
complete tree implementation, essentially, but done in the same general way
as B-heaps, in order to maximise the performance gain by staying within the
CPU cache.
Of course, a simple code example is needed. First in the base library:
type O struct {
IsLeft, IsRight, IsEqual, IsEmpty func() bool
IsOutsideTree, IsInsideTree func() bool
LeftChild, RightChild func()
}
type B interface {
// These only deal with the concrete type
IsLeft(interface{}) bool
IsRight(interface{}) bool
IsEqual(interface{}) bool
IsEmpty() bool
// These deal with the concrete type as well as the parent type
BIsOutsideTree(O) bool
IsOutsideTree() bool
BIsInsideTree(O) bool
IsInsideTree() bool
BLeftChild(O)
LeftChild()
BRightChild(O)
RightChild()
BLeft(O)
Left()
BRight(O)
Right()
BFind(O)
BFind(O)
Find(interface{}) error
Binsert(O, interface{}) error
Insert(interface{}) error
BDelete(O, interface{}) error
Delete() error
}
I have only included in the above the functions that actually have any
contact with the concrete type.
Then, just one example, of how I overload or whatever it might be called,
so that the base library can depend on the composed/embedded derivative of
the base interface:
// LeftChild -
func (b *BastU32) LeftChild() {
b.BLeftChild(b.Overloads)
}
The function called in this can then use the functions passed into it
without knowing the type, as you can see none of them accept parameters or
return type specific values.
I bundle all of the functions in the structure at the top because while
several only require one or two of these functions, some require all of
them and passing the individual functions makes for a lot of typing in the
concrete type library.
The b.Overloads - I am not sure, whether it should be pass by value or pass
by reference. I think pass by reference adds an extra resolution step, but
it is probably more expensive as every call of the functions will have to
perform this dereference.
I hope this is clear enough and detailed enough to explain in adequately
concrete detail what I am trying to achieve and how I am doing it. I don't
think this is OOP patterning, as I only want to have one level of
abstraction and the only reason is that it reduces the size of the code
that needs to be maintained. The base library does not care what the
payload type is, only how to do these comparisons (the first 4 in the
interface) and the rest either need those comparators or need a function
that needs the comparators.
Using first class functions seemed to be the simplest way to implement
this. It was not otherwise possible to get the base library to use the type
specific library functions other than by passing the functions and
rewriting them to use the passed functions, and being there is a reasonably
sized collection, with 8 functions, and especially the ones in the bottom
half, nearly all of the set in the struct are required.
I know the base library functions naming scheme is maybe a bit clumsy, but
I don't think it's that important.
I think it would be nice if there was some less wordy way to perform the
override but this would literally be the most direct low level procedural
programming way of doing it anyway, and as I understand it OOP languages
hide this but that's how they implement. The big advantage is I can keep
all the tree and abstract type aware code in the base, and implementation
for a new type is not excessively long winded. All the non-B-prefixed
functions have to be written more or less the same as the leftchild example
above, and the New function that returns the 'class' to the caller only
directly see the type relevant stuff, the abstract stuff is not directly
exported.
The concrete type library is all the user of this library needs to know in
order to use it. They would have to go out of their way to tinker with the
base library. Because of Go's package scoping rules everything has to be
exported or this would not be possible to implement in separate packages,
and stuffing the different type implementations would both be ugly in terms
of naming the packages. Well, maybe I am wrong about that. If they all
lived in the same package folder all the delicate internals could be not
exported and thus not directly accessible.
I think that it is important to keep them apart so that another user of the
library could either make their own independent repository that imports my
base library, as there likely is a whole mountain of possible variations.
For one thing, key/value pairs, sorted by value, where the payload data is
a struct containing a hash key and something that refers to another data
set. I designed this originally for a purpose involving searching for half
hash-collisions, and so for that there would need to be distinct
comparators that only compare the first or last half of the data in the
payload, though in every other way the code would be the same. Possibly
these comparators could be overloaded by a user so only the greater/less
function (isleft isright) are different.
Thanks in advance for any feedback on this. I struggled for a long time to
figure out how to do this and I couldn't quite grasp related things. I
don't think there is really that many real cases where this kind of
abstraction actually serves any purpose, but any kind of search/indexing
system likely will need to be able to separate the payload from the
organising structure. And in an application where several are used (again,
my intended purpose for developing this) as well as being easier to
maintain code, the tree side of things is only needed to be stored once as
it is the same for any other payload data type library using this one,
which would reduce pressure on the code cache as well.
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.