Using DecisionTree to build a random forest model. Small, 200 items, 664
predictors for each item, input file size under 1 MB
I can build a random forest model with 1000 trees in about 8 seconds -
great.
@time model=build_forest(yvalues[:,1],features,2,1000,0.5)
Then I tried to save that model for subsequent scoring by writing it to a
JLD file.
Writing to an NFS mounted disk took multiple minutes, while writing a 194MB
(!!) file.
If I write that to /dev/shm, it still takes 51 seconds (and still 194MB)
@time save("/dev/shm/foo.jld","model",model)
51.406531 seconds (12.01 M allocations: 465.667 MB, 0.38% gc time)
When I do something comparable in R with the same dataset, build the model
and then use save() to save the model and the features, the whole process
takes about 14 seconds, and is 2.8MB on disk. The save() part of the
processing is very fast.
whos() shows
model 6884 KB DecisionTree.Ensemble
so if this is a good estimate of memory, I don't think the problem is with
the DecisionTree object.
Am I doing something wrong, or is JLD doing something horrible?
Saw this. https://github.com/JuliaLang/julia/issues/7893, so perhaps
problems still persist?