I'm just starting out with Julia, so please forgive me if this is a 
simplistic question.

I'm using the DecisionTree package which generates an Ensemble of 
DecisionTrees in the code below:


######

using DataFrames
using DecisionTree

clarity = readtable("/Users/marcstein/Active/julia/clarity.csv");
head(clarity)

labels = array(clarity[:, 41]);
features = array(clarity[:, 1:40]);

# Random Forest Classifier

# train random forest classifier
# using 2 random features, 10 trees, and 0.5 portion of samples per tree 
(optional)

model = build_forest(labels, features, 2, 10, 0.5)

# apply learned model

outcome = apply_forest(model, 
[2,761,0,0,2,1.32,74,0,365,3,2,15,10,1,0,1,24,36,2000,0,1,1,0,0,0,1,0,0,0,1,5,1,0,2,0,2,220,221,220,221])

# # run n-fold cross validation for forests
# # using 2 random features, 10 trees, 3 folds and 0.5 of samples per tree 
(optional)

accuracy = nfoldCV_forest(labels, features, 2, 10, 3, 0.5)

score = (mean(accuracy[1:3]))

println(outcome)
println(score)

######


This code works fine. But because the DataFrame that is the training set is 
quite large, I want to build the model and store it in one app and then 
load the model and generate the outcome in a second app. 

It seems like this should be simple, just persist the model in a file and 
pass it into the apply_forest method.

I can't find a way, though, to persist the model. If I try 

writedlm(outfile,model)

I get:

EnsembleERROR: `start` has no method matching start(::Ensemble)
 in writedlm at datafmt.jl:535
 in writedlm at datafmt.jl:554

If I try:

print(outfile,model)

the output file contains:

Ensemble of Decision Trees
Trees:      10
Avg Leaves: 117.8
Avg Depth:  20.8

which is a summary of the Ensemble, not the individual elements.

I'm clearly missing something, but I haven't been able to figure it out so 
far.

Any suggestions would be greatly appreciated!


Reply via email to