Hi all,
I'm implementing BSP trees, and I've run into a style issue which I'm
uncertain about.
For my current purposes, the trees will be holding floats, but I wanted to
write things in a way which allowed other possibilities later. At first, I
made an explicit Leaf type:
*Version 1*
abstract Leaf
type FloatLeaf <: Leaf
value::Float64
end
abstract Split # abstract type for space partition
# starting out with axis-aligned splits, to implement arbitrary-angle later.
# Axis-aligned splits make rectangles, hence RectSplit
type RectSplit <: Split
var::Int
loc::Float64
end
# internal tree node. Two sides of a split are pos and neg.
type TreeNode
split::Split # split definition
pos::Union(TreeNode, Leaf) # negative side
neg::Union(TreeNode, Leaf) # positive side
end
(This is actually a bit simplified; I also had a special Nothing type to
indicate an empty case. The NA type from dataframes was not appropriate
because I wanted different behavior.)
I implemented functions on the trees mainly by using multi-dispatch in a
pattern-matching style, matching against TreeNode or Leaf types to
differentiate between the recursive case and the base case.
I eventually decided that the Leaf type seemed rather pointless, because it
was just a container for a value. Using a Leaf type created some annoyance
with defining functions to operate on leaves, when the obvious thing to do
was just apply those functions to the leaf values. Also, I was a little
worried that putting stuff inside a Leaf container would be a small
efficiency hit (probably not much?).
End result, I modified to this:
*Version 2*
abstract Split
type RectSplit <: Split
var::Int
loc::Float64
end
type TreeNode{a}
split::Split # split definition
pos::Union(TreeNode{a}, a) # negative side
neg::Union(TreeNode{a}, a) # positive side
end
This is shorter, which seems like a good sign. It also manages to indicate
that all leafs should have the same type, which the previous version
didn't. However, I couldn't keep using multiple dispatch to define the
different cases for my functions. The closest thing I could do would be to
make the cases which match to Leaf type match to Float instead. This works
when the tree is only holding floats, but I wanted to handle more general
cases.
Since this didn't seem right, I switched most of my code to use if-else in
one function rather than multiple dispatch on the types. Leaving the leaf
case for the final "else" allows me to write things in a generic way (no
explicit assumption that I'm dealing with Floats in functions that don't
need to interact with this assumption).
However, this approach still doesn't work if I want to write a method which
applies to trees (be they internal nodes or leaf nodes) but allows other
function definitions for other types. Anything which isn't a TreeNode is
now assumed to be a leaf, the way I've done it. This isn't actually a
problem yet, but it could be at some point.
Does this mean I was wrong to switch to the 2nd version?
Any other suggestions for how this should go?
Best,
Abram