Hi Julia Users,

I'm one of the Core-Devs in the BioJulia organisation, with a background in 
evolutionary biology/genetics, and, with a few other contributors I'm 
writing Bio.jl's Phylo submodule.

The primary type of this submodule is the Phylogeny. Which is a composite 
type, used to describe a model of evolution. At the very minimum it looks 
like this:

type PhyNode
    children::Vector{PhyNode}
    parent::PhyNode
    
    function PhyNode(children::Vector{PhyNode} = PhyNode[],
                     parent = nothing)
        x = new()
        if parent != nothing
            graft!(parent, x)
        else
            x.parent = x
        end
        x.children = PhyNode[]
        for child in children
            graft!(x, child)
        end
        return x
    end
end

type Phylogeny
    root::PhyNode
    rooted::Bool
    rerootable::Bool

    Phylogeny() = new(PhyNode(), false, true)
end

PhyNodes are types which link to their children and to their parent - they 
are the individual objects that form the tree structure. The Phylogeny type 
describes the overall tree, and contains a variable pointing to a PhyNode 
that forms the root of the tree, and determines whether the tree is rooted 
in the phylogenetic sense, and whether the phylogeny is re-rootable. So far 
so good. We can represent the structure of a phylogeny - a model of how 
various species are related through history.

Here is where I'd like comments from the julia-users, if possible: With a 
phylogeny, often additional information is annotated to the tree, like 
branch lengths, confidence intervals, sequences, labels, colours for 
plotting, and so on. Well, we can do this with a Dict, and use PhyNodes as 
keys:

typealias NodeAnnotation{T} Dict{PhyNode, T}

We can then store thee annotations in the Phylogeny type like this:
type Phylogeny{S <: AbstractString}
    root::PhyNode
    rooted::Bool
    rerootable::Bool
    annotations::Dict{S, Any}
end

However, I don't like the type uncertainty of Any because if I'm correct, 
it could propagate up through a user's code. But we will always have some 
uncertainty, because we don't know in advance what the user might want to 
annotate the Phylogeny with - could be anything from simple float values, 
to other composite types.

Am I correct that the uncertainty getting and setting such annotations, 
would propagate through the user's code when they deal with annotations?
If so, we have tried to think of ways to get around this. One idea was to 
store the NodeAnnotations in the phylogeny according to the type of their 
values, and then provide getter and setter methods that make the return 
type predictable from the types of the parameters passed in the method:

type Phylogeny{S<:AbstractString}
    root::PhyNode
    rooted::Bool
    rerootable::Bool
    annotations::Dict{Type, Dict{S, NodeAnnotation{Any}}}
end

function setannotation!{T}(x::Phylogeny, name::ASCIIString, ann::
NodeAnnotation{T})
    if haskey(x.annotations,T)
        x.annotations[T][name] = ann
    else 
        x.annotations[T] = [name => ann]
    end
end 

function getannotations{T}(x::Phylogeny, name::ASCIIString, ::Type{T})
    x.annotations[T][name]::Dict{PhyNode, T}
end

This seems like it works and would indeed make getting and setting more 
type predictable, the only annoying part is that Dicts get converted:

julia> setannotation!(tree, "Node Names", NodeAnnotation{ASCIIString}())
Dict{PhyNode,ASCIIString} with 0 entries


julia> tree
Phylogeny{ASCIIString}(PhyNode(),false,false,Dict{Type{T},Dict{ASCIIString,
Dict{PhyNode,Any}}}(ASCIIString=>Dict("Node Names"=>Dict{PhyNode,Any}())))

You see Dict{PhyNode, ASCIIString} got converted to Dict{PhyNode, Any}.

If anyone has comments on this or has advice on how to prevent type 
uncertainty propagating, please do share. How should we be approaching this?

Many thanks,
Ben.

Reply via email to