[jira] [Issue Comment Edited] (MAHOUT-968) Classifier based on restricted boltzmann machines

Issue Comment Edited Tue, 07 Feb 2012 11:17:25 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202642#comment-13202642
 ]


Dirk Weißenborn edited comment on MAHOUT-968 at 2/7/12 7:15 PM:
----------------------------------------------------------------

Here comes the first patch! hope it is working!
training: 
org.apache.mahout.classifier.rbm.training.RBMClassifierTrainingJob

can be run with mapreduce or locally:
ex,:
java org.apache.mahout.classifier.rbm.training.RBMClassifierTrainingJob --input 
dirOrFile --output pathWhereModelShouldBeWritten --labelcount 10 --epochs 30 
--monitor

Training consists of 3 steps: initialize biases, greedy pretraining, 
finetuning... however it is possible to train the model on few of them at a 
time (options: --nogreedy --nobiases --nofinetuning

testing: 
 org.apache.mahout.classifier.rbm.test.TestRBMClassifierJob should be clearer

Preparation of mnist dataset in examples:
org.apache.mahout.classifier.rbm.MnistPreparer
       "size" is number of examples being processed into "chunknumber" 
minibatches (or chunks), labelpath and imagepath refer to the 
training-/testdata from the mnist dataset

I am doing my own tests on the mnist dataset right now and it is nearly done. 
It is taking some time because its size but manageable. I can upload the 
trained model if someone wants it for testing.

                
      was (Author: dirk.weissenborn):
    Here comes the first patch! hope it is working!
training: 
org.apache.mahout.classifier.rbm.training.RBMClassifierTrainingJob

can be run with mapreduce or locally:
ex,:
java org.apache.mahout.classifier.rbm.training.RBMClassifierTrainingJob --input 
dirOrFile --output pathWhereModelShouldBeWritten --labelcount 10 --epochs 30 
--monitor

Training consists of 3 steps: initialize biases, greedy pretraining, 
finetuning... however it is possible to train the model on few of them at a 
time (options: --nogreedy --nobiases --nofinetuning

testing: 
 org.apache.mahout.classifier.rbm.test.TestRBMClassifierJob should be clearer

Preparation of mnist dataset in examples:
org.apache.mahout.classifier.rbm.MnistPreparer
       "-size" is number of examples being processed into "-chunknumber" 
minibatches (or chunks), labelpath and imagepath refer to the 
training-/testdata from the mnist dataset

I am doing my own tests on the mnist dataset right now and it is nearly done. 
It is taking some time because its size but manageable. I can upload the 
trained model if someone wants it for testing.

                  
> Classifier based on restricted boltzmann machines
> -------------------------------------------------
>
>                 Key: MAHOUT-968
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-968
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>    Affects Versions: 0.7
>            Reporter: Dirk Weißenborn
>              Labels: classification, mnist
>             Fix For: 0.7
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This is a proposal for a new classifier based on restricted boltzmann 
> machines. The development of this feature follows the paper on "Deep 
> Boltzmann Machines" (DBM) [1] from 2009. The proposed model (DBM) got an 
> error rate of 0.95% on the mnist dataset [2], which is really good. Main 
> parts of the implementation should also be applicable to other scenarios than 
> classification where restricted boltzmann machines are used (ref. MAHOUT-375).
> I am working on this feature right now, and the results are promising. The 
> only problem with the training algorithm is, that it is still mostly 
> sequential (if training batches are small, what they should be), which makes 
> Map/Reduce until now, not really beneficial. However, since the algorithm 
> itself is fast (for a training algorithm), training can be done on a single 
> machine in managable time.
> Testing of the algorithm is currently done on the mnist dataset itself to 
> reproduce results of [1]. As soon as results indicate, that everything is 
> working fine, I will upload the patch.
> [1] http://www.cs.toronto.edu/~hinton/absps/dbm.pdf
> [2] http://yann.lecun.com/exdb/mnist/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-968) Classifier based on restricted boltzmann machines

Reply via email to