For those who don't know: Mocha is a Deep Learning framework for Julia <http://julialang.org/>, inspired by the C++ Deep Learning framework Caffe <http://caffe.berkeleyvision.org/>. Some hilights:
- *Modular Architecture*: Mocha has a clean architecture with isolated components like network layers, activation functions, solvers, regularizers, initializers, etc. Built-in components are sufficient for typical deep (convolutional) neural network applications and more are being added in each release. All of them could be easily extended by adding custom sub-types. - *High-level Interface*: Mocha is written in Julia <http://julialang.org/>, a high-level dynamic programming language designed for scientific computing. Combining with the expressive power of Julia and other its package eco-system, playing with deep neural networks in Mocha is easy and intuitive. See for example our IJulia Notebook example of using a pre-trained imagenet model to do image classification <http://nbviewer.ipython.org/github/pluskid/Mocha.jl/blob/master/examples/ijulia/ilsvrc12/imagenet-classifier.ipynb> . - *Portability and Speed*: Mocha comes with multiple backend that could be switched transparently. - The *pure Julia backend* is portable -- it runs on any platform that support Julia. This is reasonably fast on small models thanks to Julia's LLVM-based just-in-time (JIT) compiler and Performance Annotations <http://julia.readthedocs.org/en/latest/manual/performance-tips/#performance-annotations>, and could be very useful for prototyping. - The *native extension backend* could be turned on when a C++ compiler is available. It runs 2~3 times faster than the pure Julia backend. - The *GPU backend* uses NVidia® cuDNN <https://developer.nvidia.com/cuDNN>, cuBLAS and customized CUDA kernels to provide highly efficient computation. 20~30 times or even more speedup could be observed on a modern GPU device, especially on larger models. - *Compatability*: Mocha uses the widely adopted HDF5 format to store both datasets and model snapshots, making it easy to inter-operate with Matlab, Python (numpy) and other existing computational tools. Mocha also provides tools to import trained model snapshot from Caffe. - *Correctness*: the computational components in Mocha in all backends are extensively covered by unit-tests. - *Open Source*: Mocha is licensed under the MIT "Expat" License <https://github.com/pluskid/Mocha.jl/blob/master/LICENSE.md>. And here is the changelog for v0.0.4 v0.0.4 2014.12.09 - Network - Parameter (l2-norm) constraints (@stokasto) - Random shuffling for HDF5 data layer - ConcatLayer - Infrastructure - Momentum policy (@stokasto) - Save training statistics to file and plot tools (@stokasto) - Coffee breaks now have a coffee lounge - Auto detect whether CUDA kernel needs update - Stochastic Nesterov Accelerated Gradient Solver - Solver refactoring: - Behaviors for coffee breaks are simplified - Solver state variables like iteration now has clearer semantics - Support loading external pre-trained models for fine-tuning - Support explicit weight-sharing layers - Behaviors of layers taking multiple inputs made clear and unit-tested - Refactoring: - Removed the confusing System type - CuDNNBackend renamed to GPUBackend - Cleaned up cuBLAS API (@stokasto) - Layers are now organized by characterization properties - Robustness - Various explicit topology verifiecations for Net and unit tests - Increased unit test coverage for rare cases - Updated dependency to HDF5.jl 0.4.7 - Documentation - A new MNIST example using fully connected and dropout layers (@stokasto) - Reproducible MNIST results with fixed random seed (@stokasto) - Tweaked IJulia Notebook image classification example - Document for solvers and coffee breaks Best, pluskid
