Thank you, it's very useful to have examples.

On Monday, March 7, 2016 at 9:16:46 AM UTC-5, [email protected] wrote:
>
> To report back, my experience with Mocha.jl has been very good. The 
> following is an example of how one can do regression with Mocha. This 
> assumes that there are two data files "train.dat" and "test.dat", which are 
> plain ascii files, space delimited, variables in columns. The outputs are 
> in columns 1-9, and the inputs in the remaining columns (adjust this to fit 
> your needs). The net as configured in the example has two hidden layers, of 
> 300 and 40 neurons, respectively. In my application, there are 40 inputs 
> and 9 outputs, and this net works very well, with a training set of 2e5 
> observations and a test set of 2e4 observations. Doing the training using 
> CUDA, it is very fast, I was pleasantly surprised. I did it using a GPU 
> instance of Amazon EC2. Using the C backend, it's considerably slower, but 
> still can be trained well in less than 24 hours. For training a number of 
> nets, I'd say that making the effort to take advantage of CUDA is 
> definitely worthwhile.
>
>
>
> ############################################################
> # select backend
> ############################################################
> #ENV["MOCHA_USE_NATIVE_EXT"] = "true"
> ENV["MOCHA_USE_CUDA"] = "true"
> ############################################################
> # other setup
> ############################################################
> #ENV["OMP_NUM_THREADS"] = 1
> #blas_set_num_threads(1)
> using Mocha
> srand(12345678)
> backend = DefaultBackend()
> init(backend)
> snapshot_dir = "300_40_snapshots"
> ############################################################
> # Load the data (already pre-processed)
> ############################################################
> train_inp = readdlm("train.dat")
> Y = train_inp[:,1:9]
> X = train_inp[:,10:end]
> Y = Y'
> X = X'
> test_inp = readdlm("test.dat")
> YT = test_inp[:,1:9]
> XT = test_inp[:,10:end]
> YT = YT'
> XT = XT'
> ############################################################
> # Define network
> ############################################################
> # specify sizes of layers
> # best so far is 300, 40: 0.143, 0.085 better that 80,40
> Layer1Size = 300
> Layer2Size = 40
> #Layer3Size = 30
> #Layer4Size = 20
> # create the network
> data = MemoryDataLayer(batch_size=2000, data=Array[X,Y])
> h1 = InnerProductLayer(name="ip1",neuron=Neurons.Tanh(), 
> output_dim=Layer1Size, tops=[:pred1], bottoms=[:data])
> h2 = InnerProductLayer(name="ip2",neuron=Neurons.Tanh(), 
> output_dim=Layer2Size, tops=[:pred2], bottoms=[:pred1])
> #h3 = InnerProductLayer(name="ip3",neuron=Neurons.Tanh(), 
> output_dim=Layer3Size, tops=[:pred3], bottoms=[:pred2])
> #h4 = InnerProductLayer(name="ip4",neuron=Neurons.Tanh(), 
> output_dim=Layer3Size, tops=[:pred4], bottoms=[:pred3])
> output = InnerProductLayer(name="aggregator", output_dim=9, 
> tops=[:output], bottoms=[:pred2] )
> loss_layer = SquareLossLayer(name="loss", bottoms=[:output, :label])
> common_layers = [h1,h2,output]
> net = Net("dsge-train", backend, [data, common_layers, loss_layer])
> # create the validation network
> datatest = MemoryDataLayer(batch_size=20000, data=Array[XT,YT])
> accuracy = SquareLossLayer(name="acc", bottoms=[:output, :label])
> net_test = Net("dsge-test", backend, [datatest, common_layers, accuracy])
> test_performance = ValidationPerformance(net_test)
> ############################################################
> # Solve
> ############################################################
> lr_policy = LRPolicy.DecayOnValidation(0.02, "test-accuracy-accuracy", 0.9)
> method = SGD()
> params = make_solver_parameters(method, regularization_type="L2", 
> regu_coef=0.000, mom_policy=MomPolicy.Fixed(0.9), max_iter=300000, 
> lr_policy=lr_policy, load_from=snapshot_dir)
> solver = Solver(method, params)
> add_coffee_break(solver, TrainingSummary(), every_n_iter=1000)
> add_coffee_break(solver, Snapshot(snapshot_dir), every_n_iter=1000)
> add_coffee_break(solver, test_performance, every_n_iter=1000)
> # link the decay-on-validation policy with the actual performance validator
> setup(lr_policy, test_performance, solver)
> solve(solver, net)
> Mocha.dump_statistics(solver.coffee_lounge, get_layer_state(net, "loss"), 
> true)
> destroy(net)
> destroy(net_test)
> shutdown(backend)
>
>
>

Reply via email to