To report back, my experience with Mocha.jl has been very good. The 
following is an example of how one can do regression with Mocha. This 
assumes that there are two data files "train.dat" and "test.dat", which are 
plain ascii files, space delimited, variables in columns. The outputs are 
in columns 1-9, and the inputs in the remaining columns (adjust this to fit 
your needs). The net as configured in the example has two hidden layers, of 
300 and 40 neurons, respectively. In my application, there are 40 inputs 
and 9 outputs, and this net works very well, with a training set of 2e5 
observations and a test set of 2e4 observations. Doing the training using 
CUDA, it is very fast, I was pleasantly surprised. I did it using a GPU 
instance of Amazon EC2. Using the C backend, it's considerably slower, but 
still can be trained well in less than 24 hours. For training a number of 
nets, I'd say that making the effort to take advantage of CUDA is 
definitely worthwhile.



############################################################
# select backend
############################################################
#ENV["MOCHA_USE_NATIVE_EXT"] = "true"
ENV["MOCHA_USE_CUDA"] = "true"
############################################################
# other setup
############################################################
#ENV["OMP_NUM_THREADS"] = 1
#blas_set_num_threads(1)
using Mocha
srand(12345678)
backend = DefaultBackend()
init(backend)
snapshot_dir = "300_40_snapshots"
############################################################
# Load the data (already pre-processed)
############################################################
train_inp = readdlm("train.dat")
Y = train_inp[:,1:9]
X = train_inp[:,10:end]
Y = Y'
X = X'
test_inp = readdlm("test.dat")
YT = test_inp[:,1:9]
XT = test_inp[:,10:end]
YT = YT'
XT = XT'
############################################################
# Define network
############################################################
# specify sizes of layers
# best so far is 300, 40: 0.143, 0.085 better that 80,40
Layer1Size = 300
Layer2Size = 40
#Layer3Size = 30
#Layer4Size = 20
# create the network
data = MemoryDataLayer(batch_size=2000, data=Array[X,Y])
h1 = InnerProductLayer(name="ip1",neuron=Neurons.Tanh(), 
output_dim=Layer1Size, tops=[:pred1], bottoms=[:data])
h2 = InnerProductLayer(name="ip2",neuron=Neurons.Tanh(), 
output_dim=Layer2Size, tops=[:pred2], bottoms=[:pred1])
#h3 = InnerProductLayer(name="ip3",neuron=Neurons.Tanh(), 
output_dim=Layer3Size, tops=[:pred3], bottoms=[:pred2])
#h4 = InnerProductLayer(name="ip4",neuron=Neurons.Tanh(), 
output_dim=Layer3Size, tops=[:pred4], bottoms=[:pred3])
output = InnerProductLayer(name="aggregator", output_dim=9, tops=[:output], 
bottoms=[:pred2] )
loss_layer = SquareLossLayer(name="loss", bottoms=[:output, :label])
common_layers = [h1,h2,output]
net = Net("dsge-train", backend, [data, common_layers, loss_layer])
# create the validation network
datatest = MemoryDataLayer(batch_size=20000, data=Array[XT,YT])
accuracy = SquareLossLayer(name="acc", bottoms=[:output, :label])
net_test = Net("dsge-test", backend, [datatest, common_layers, accuracy])
test_performance = ValidationPerformance(net_test)
############################################################
# Solve
############################################################
lr_policy = LRPolicy.DecayOnValidation(0.02, "test-accuracy-accuracy", 0.9)
method = SGD()
params = make_solver_parameters(method, regularization_type="L2", 
regu_coef=0.000, mom_policy=MomPolicy.Fixed(0.9), max_iter=300000, 
lr_policy=lr_policy, load_from=snapshot_dir)
solver = Solver(method, params)
add_coffee_break(solver, TrainingSummary(), every_n_iter=1000)
add_coffee_break(solver, Snapshot(snapshot_dir), every_n_iter=1000)
add_coffee_break(solver, test_performance, every_n_iter=1000)
# link the decay-on-validation policy with the actual performance validator
setup(lr_policy, test_performance, solver)
solve(solver, net)
Mocha.dump_statistics(solver.coffee_lounge, get_layer_state(net, "loss"), 
true)
destroy(net)
destroy(net_test)
shutdown(backend)


Reply via email to