Using DecisionTree to build a random forest model.  Small, 200 items, 664 
predictors for each item, input file size under 1 MB

I can build a random forest model with 1000 trees in about 8 seconds - 
great.

@time model=build_forest(yvalues[:,1],features,2,1000,0.5)

Then I tried to save that model for subsequent scoring by writing it to a 
JLD file.

Writing to an NFS mounted disk took multiple minutes, while writing a 194MB 
(!!) file.

If I write that to /dev/shm, it still takes 51 seconds (and still 194MB)

@time save("/dev/shm/foo.jld","model",model)
 51.406531 seconds (12.01 M allocations: 465.667 MB, 0.38% gc time)

When I do something comparable in R with the same dataset, build the model 
and then use save() to save the model and the features, the whole process 
takes about 14 seconds, and is 2.8MB on disk. The save() part of the 
processing is very fast.

whos() shows

                         model   6884 KB     DecisionTree.Ensemble

so if this is a good estimate of memory, I don't think the problem is with 
the DecisionTree object.

Am I doing something wrong, or is JLD doing something horrible?

Saw this. https://github.com/JuliaLang/julia/issues/7893, so perhaps 
problems still persist?


Reply via email to