I'm just starting out with Julia, so please forgive me if this is a
simplistic question.
I'm using the DecisionTree package which generates an Ensemble of
DecisionTrees in the code below:
######
using DataFrames
using DecisionTree
clarity = readtable("/Users/marcstein/Active/julia/clarity.csv");
head(clarity)
labels = array(clarity[:, 41]);
features = array(clarity[:, 1:40]);
# Random Forest Classifier
# train random forest classifier
# using 2 random features, 10 trees, and 0.5 portion of samples per tree
(optional)
model = build_forest(labels, features, 2, 10, 0.5)
# apply learned model
outcome = apply_forest(model,
[2,761,0,0,2,1.32,74,0,365,3,2,15,10,1,0,1,24,36,2000,0,1,1,0,0,0,1,0,0,0,1,5,1,0,2,0,2,220,221,220,221])
# # run n-fold cross validation for forests
# # using 2 random features, 10 trees, 3 folds and 0.5 of samples per tree
(optional)
accuracy = nfoldCV_forest(labels, features, 2, 10, 3, 0.5)
score = (mean(accuracy[1:3]))
println(outcome)
println(score)
######
This code works fine. But because the DataFrame that is the training set is
quite large, I want to build the model and store it in one app and then
load the model and generate the outcome in a second app.
It seems like this should be simple, just persist the model in a file and
pass it into the apply_forest method.
I can't find a way, though, to persist the model. If I try
writedlm(outfile,model)
I get:
EnsembleERROR: `start` has no method matching start(::Ensemble)
in writedlm at datafmt.jl:535
in writedlm at datafmt.jl:554
If I try:
print(outfile,model)
the output file contains:
Ensemble of Decision Trees
Trees: 10
Avg Leaves: 117.8
Avg Depth: 20.8
which is a summary of the Ensemble, not the individual elements.
I'm clearly missing something, but I haven't been able to figure it out so
far.
Any suggestions would be greatly appreciated!