ok, here is what i think: The two objectives of the algorithms are different. For training the deep boltzmann machine, you have to train greedily all of its restricted boltzmann machines in their specific order. this greedy pre training takes most of the whole classifier training. this greedy pretraining does not take an input and a label, it rather takes just an input and maximizes the probability of this input. hence what is actually trained by the model is p(x,y) and not p(y|x). this has the advantage that given a label y the network is able to sample good examples for x. So the model is not just a discriminative model but also a generative model. The training algorithm as described in the paper, takes an input, samples "what it thinks the input should be" and penalizes everything that is different from the input. each iteration needs to have the model with the updated weights of the prior iteration and thus I can't see how it could be better parallelized without changing the algorithm. Whats done in parallel are the computations of the gradient for each trainingcase in a batch and then summed up to form the weightupdates. Hope that was more or less what you wanted... As I said, from my standpoint I can't see many parallels, because the approaches are different. I will still take a closer look on the paper from MS.
Dirk 2012/2/2 Ted Dunning <[email protected]> > The algorithm in the Microsoft paper is not *labeled* as a neural network, > but it is still a generalized linear model. That means it is essentially a > single level neural network. > > The key importance of the paper is that it represents a bayesian estimate > of distribution for each parameter. Then when probabilities are being > estimated, it gives back a stochastic value. This can be used to select > which training data to use. This is useful in either multi-armed bandit > settings or in active learning. > > Updates are driven through the network using message passing. > > The use of stochastic training ideas and message passing where the essence > of my question. In the microsoft paper, the parameters can be viewed as > stochastic generators connect by weights to the output. > > Differences occur in the way that the variances (temperature) are > decreased, but aside from the nomenclatural differences, it seems like > there might be a common core. > > Thus, my question to you is "does this common core exist"? > > > On Wed, Feb 1, 2012 at 1:57 PM, Dirk Weissenborn < > [email protected]> wrote: > > > Hello Ted, > > > > I would have to study the paper you ve given me first a little bit. What > I > > could do at the moment is a small adn easy overview over the model and > > algorithm I am implementing... Deep Boltzmann Machines that I am using > for > > classification are artificial neural networks based on stacked restricted > > boltzmann machines. I think the models are quite different on how they > > exactly work since the model of the paper you wrote isn t an artificial > > neural network or anything close at the first glance. Therefore it seems > to > > me that comparing the algorithms is quite difficult. If you still would > > like a comparison, I will see what I can do. > > > > regards > > Dirk > > > > 2012/2/1 Ted Dunning <[email protected]> > > > > > Dirk, > > > > > > Can you provide some comparison with RBM's and the Bayesian learning > > > algorithm such as described here: > > > http://research.microsoft.com/apps/pubs/default.aspx?id=122779 > > > > > > On Wed, Feb 1, 2012 at 3:32 AM, Dirk Weißenborn (Created) (JIRA) < > > > [email protected]> wrote: > > > > > > > Classifier based on restricted boltzmann machines > > > > ------------------------------------------------- > > > > > > > > Key: MAHOUT-968 > > > > URL: > https://issues.apache.org/jira/browse/MAHOUT-968 > > > > Project: Mahout > > > > Issue Type: New Feature > > > > Components: Classification > > > > Reporter: Dirk Weißenborn > > > > > > > > > > > > This is a proposal for a new classifier based on restricted boltzmann > > > > machines. The development of this feature follows the paper on "Deep > > > > Boltzmann Machines" (DBM) [1] from 2009. The proposed model (DBM) got > > an > > > > error rate of 0.95% on the mnist dataset [2], which is really good. > > Main > > > > parts of the implementation should also be applicable to other > > scenarios > > > > than classification where restricted boltzmann machines are used > (ref. > > > > MAHOUT-375). > > > > I am working on this feature right now, and the results are > promising. > > > The > > > > only problem with the training algorithm is, that it is still mostly > > > > sequential (if training batches are small, what they should be), > which > > > > makes Map/Reduce until now, not really beneficial. However, since the > > > > algorithm itself is fast (for a training algorithm), training can be > > done > > > > on a single machine in managable time. > > > > Testing of the algorithm is currently done on the mnist dataset > itself > > to > > > > reproduce results of [1]. As soon as results indicate, that > everything > > is > > > > working fine, I will upload the patch. > > > > > > > > [1] http://www.cs.toronto.edu/~hinton/absps/dbm.pdf > > > > [2] http://yann.lecun.com/exdb/mnist/ > > > > > > > > -- > > > > This message is automatically generated by JIRA. > > > > If you think it was sent incorrectly, please contact your JIRA > > > > administrators: > > > > > > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa > > > > For more information on JIRA, see: > > > http://www.atlassian.com/software/jira > > > > > > > > > > > > > > > > > >
