Thank you, it's very useful to have examples.
On Monday, March 7, 2016 at 9:16:46 AM UTC-5, [email protected] wrote: > > To report back, my experience with Mocha.jl has been very good. The > following is an example of how one can do regression with Mocha. This > assumes that there are two data files "train.dat" and "test.dat", which are > plain ascii files, space delimited, variables in columns. The outputs are > in columns 1-9, and the inputs in the remaining columns (adjust this to fit > your needs). The net as configured in the example has two hidden layers, of > 300 and 40 neurons, respectively. In my application, there are 40 inputs > and 9 outputs, and this net works very well, with a training set of 2e5 > observations and a test set of 2e4 observations. Doing the training using > CUDA, it is very fast, I was pleasantly surprised. I did it using a GPU > instance of Amazon EC2. Using the C backend, it's considerably slower, but > still can be trained well in less than 24 hours. For training a number of > nets, I'd say that making the effort to take advantage of CUDA is > definitely worthwhile. > > > > ############################################################ > # select backend > ############################################################ > #ENV["MOCHA_USE_NATIVE_EXT"] = "true" > ENV["MOCHA_USE_CUDA"] = "true" > ############################################################ > # other setup > ############################################################ > #ENV["OMP_NUM_THREADS"] = 1 > #blas_set_num_threads(1) > using Mocha > srand(12345678) > backend = DefaultBackend() > init(backend) > snapshot_dir = "300_40_snapshots" > ############################################################ > # Load the data (already pre-processed) > ############################################################ > train_inp = readdlm("train.dat") > Y = train_inp[:,1:9] > X = train_inp[:,10:end] > Y = Y' > X = X' > test_inp = readdlm("test.dat") > YT = test_inp[:,1:9] > XT = test_inp[:,10:end] > YT = YT' > XT = XT' > ############################################################ > # Define network > ############################################################ > # specify sizes of layers > # best so far is 300, 40: 0.143, 0.085 better that 80,40 > Layer1Size = 300 > Layer2Size = 40 > #Layer3Size = 30 > #Layer4Size = 20 > # create the network > data = MemoryDataLayer(batch_size=2000, data=Array[X,Y]) > h1 = InnerProductLayer(name="ip1",neuron=Neurons.Tanh(), > output_dim=Layer1Size, tops=[:pred1], bottoms=[:data]) > h2 = InnerProductLayer(name="ip2",neuron=Neurons.Tanh(), > output_dim=Layer2Size, tops=[:pred2], bottoms=[:pred1]) > #h3 = InnerProductLayer(name="ip3",neuron=Neurons.Tanh(), > output_dim=Layer3Size, tops=[:pred3], bottoms=[:pred2]) > #h4 = InnerProductLayer(name="ip4",neuron=Neurons.Tanh(), > output_dim=Layer3Size, tops=[:pred4], bottoms=[:pred3]) > output = InnerProductLayer(name="aggregator", output_dim=9, > tops=[:output], bottoms=[:pred2] ) > loss_layer = SquareLossLayer(name="loss", bottoms=[:output, :label]) > common_layers = [h1,h2,output] > net = Net("dsge-train", backend, [data, common_layers, loss_layer]) > # create the validation network > datatest = MemoryDataLayer(batch_size=20000, data=Array[XT,YT]) > accuracy = SquareLossLayer(name="acc", bottoms=[:output, :label]) > net_test = Net("dsge-test", backend, [datatest, common_layers, accuracy]) > test_performance = ValidationPerformance(net_test) > ############################################################ > # Solve > ############################################################ > lr_policy = LRPolicy.DecayOnValidation(0.02, "test-accuracy-accuracy", 0.9) > method = SGD() > params = make_solver_parameters(method, regularization_type="L2", > regu_coef=0.000, mom_policy=MomPolicy.Fixed(0.9), max_iter=300000, > lr_policy=lr_policy, load_from=snapshot_dir) > solver = Solver(method, params) > add_coffee_break(solver, TrainingSummary(), every_n_iter=1000) > add_coffee_break(solver, Snapshot(snapshot_dir), every_n_iter=1000) > add_coffee_break(solver, test_performance, every_n_iter=1000) > # link the decay-on-validation policy with the actual performance validator > setup(lr_policy, test_performance, solver) > solve(solver, net) > Mocha.dump_statistics(solver.coffee_lounge, get_layer_state(net, "loss"), > true) > destroy(net) > destroy(net_test) > shutdown(backend) > > >
