[
https://issues.apache.org/jira/browse/MAHOUT-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202642#comment-13202642
]
Dirk Weißenborn commented on MAHOUT-968:
----------------------------------------
Here comes the first patch! hope it is working!
training:
org.apache.mahout.classifier.rbm.training.RBMClassifierTrainingJob
can be run with mapreduce or locally:
ex,:
java org.apache.mahout.classifier.rbm.training.RBMClassifierTrainingJob --input
dirOrFile --output pathWhereModelShouldBeWritten --labelcount 10 --epochs 30
--monitor
Training consists of 3 steps: initialize biases, greedy pretraining,
finetuning... however it is possible to train the model on few of them at a
time (options: --nogreedy --nobiases --nofinetuning
testing:
org.apache.mahout.classifier.rbm.test.TestRBMClassifierJob should be clearer
Preparation of mnist dataset in examples:
org.apache.mahout.classifier.rbm.MnistPreparer
"--size" is number of examples being processed into "--chunknumber"
minibatches (or chunks), labelpath and imagepath refer to the
training-/testdata from the mnist dataset
I am doing my own tests on the mnist dataset right now and it is nearly done.
It is taking some time because its size but manageable. I can upload the
trained model if someone wants it for testing.
> Classifier based on restricted boltzmann machines
> -------------------------------------------------
>
> Key: MAHOUT-968
> URL: https://issues.apache.org/jira/browse/MAHOUT-968
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Reporter: Dirk Weißenborn
> Labels: classification, mnist
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> This is a proposal for a new classifier based on restricted boltzmann
> machines. The development of this feature follows the paper on "Deep
> Boltzmann Machines" (DBM) [1] from 2009. The proposed model (DBM) got an
> error rate of 0.95% on the mnist dataset [2], which is really good. Main
> parts of the implementation should also be applicable to other scenarios than
> classification where restricted boltzmann machines are used (ref. MAHOUT-375).
> I am working on this feature right now, and the results are promising. The
> only problem with the training algorithm is, that it is still mostly
> sequential (if training batches are small, what they should be), which makes
> Map/Reduce until now, not really beneficial. However, since the algorithm
> itself is fast (for a training algorithm), training can be done on a single
> machine in managable time.
> Testing of the algorithm is currently done on the mnist dataset itself to
> reproduce results of [1]. As soon as results indicate, that everything is
> working fine, I will upload the patch.
> [1] http://www.cs.toronto.edu/~hinton/absps/dbm.pdf
> [2] http://yann.lecun.com/exdb/mnist/
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira