To report back, my experience with Mocha.jl has been very good. The
following is an example of how one can do regression with Mocha. This
assumes that there are two data files "train.dat" and "test.dat", which are
plain ascii files, space delimited, variables in columns. The outputs are
in columns 1-9, and the inputs in the remaining columns (adjust this to fit
your needs). The net as configured in the example has two hidden layers, of
300 and 40 neurons, respectively. In my application, there are 40 inputs
and 9 outputs, and this net works very well, with a training set of 2e5
observations and a test set of 2e4 observations. Doing the training using
CUDA, it is very fast, I was pleasantly surprised. I did it using a GPU
instance of Amazon EC2. Using the C backend, it's considerably slower, but
still can be trained well in less than 24 hours. For training a number of
nets, I'd say that making the effort to take advantage of CUDA is
definitely worthwhile.
############################################################
# select backend
############################################################
#ENV["MOCHA_USE_NATIVE_EXT"] = "true"
ENV["MOCHA_USE_CUDA"] = "true"
############################################################
# other setup
############################################################
#ENV["OMP_NUM_THREADS"] = 1
#blas_set_num_threads(1)
using Mocha
srand(12345678)
backend = DefaultBackend()
init(backend)
snapshot_dir = "300_40_snapshots"
############################################################
# Load the data (already pre-processed)
############################################################
train_inp = readdlm("train.dat")
Y = train_inp[:,1:9]
X = train_inp[:,10:end]
Y = Y'
X = X'
test_inp = readdlm("test.dat")
YT = test_inp[:,1:9]
XT = test_inp[:,10:end]
YT = YT'
XT = XT'
############################################################
# Define network
############################################################
# specify sizes of layers
# best so far is 300, 40: 0.143, 0.085 better that 80,40
Layer1Size = 300
Layer2Size = 40
#Layer3Size = 30
#Layer4Size = 20
# create the network
data = MemoryDataLayer(batch_size=2000, data=Array[X,Y])
h1 = InnerProductLayer(name="ip1",neuron=Neurons.Tanh(),
output_dim=Layer1Size, tops=[:pred1], bottoms=[:data])
h2 = InnerProductLayer(name="ip2",neuron=Neurons.Tanh(),
output_dim=Layer2Size, tops=[:pred2], bottoms=[:pred1])
#h3 = InnerProductLayer(name="ip3",neuron=Neurons.Tanh(),
output_dim=Layer3Size, tops=[:pred3], bottoms=[:pred2])
#h4 = InnerProductLayer(name="ip4",neuron=Neurons.Tanh(),
output_dim=Layer3Size, tops=[:pred4], bottoms=[:pred3])
output = InnerProductLayer(name="aggregator", output_dim=9, tops=[:output],
bottoms=[:pred2] )
loss_layer = SquareLossLayer(name="loss", bottoms=[:output, :label])
common_layers = [h1,h2,output]
net = Net("dsge-train", backend, [data, common_layers, loss_layer])
# create the validation network
datatest = MemoryDataLayer(batch_size=20000, data=Array[XT,YT])
accuracy = SquareLossLayer(name="acc", bottoms=[:output, :label])
net_test = Net("dsge-test", backend, [datatest, common_layers, accuracy])
test_performance = ValidationPerformance(net_test)
############################################################
# Solve
############################################################
lr_policy = LRPolicy.DecayOnValidation(0.02, "test-accuracy-accuracy", 0.9)
method = SGD()
params = make_solver_parameters(method, regularization_type="L2",
regu_coef=0.000, mom_policy=MomPolicy.Fixed(0.9), max_iter=300000,
lr_policy=lr_policy, load_from=snapshot_dir)
solver = Solver(method, params)
add_coffee_break(solver, TrainingSummary(), every_n_iter=1000)
add_coffee_break(solver, Snapshot(snapshot_dir), every_n_iter=1000)
add_coffee_break(solver, test_performance, every_n_iter=1000)
# link the decay-on-validation policy with the actual performance validator
setup(lr_policy, test_performance, solver)
solve(solver, net)
Mocha.dump_statistics(solver.coffee_lounge, get_layer_state(net, "loss"),
true)
destroy(net)
destroy(net_test)
shutdown(backend)