Drew reminds me that posting to the list is good form. He asked about SGD and how it worked. My answer was a nutshell explanation.
On Mon, Jul 19, 2010 at 10:40 AM, Drew Farris <[email protected]> wrote: > Thanks for the explanation -- in all it sounds pretty elegant. You > think in terms of numbers of examples to avoid the problem of > unbalanced training sets and you train on the features that are most > interesting in terms of providing new information. Time for me to > start reading the code. > > (Would this be worth passing along to the list, or is it re-iterating > something already been mentioned there?) > > On Mon, Jul 19, 2010 at 12:43 PM, Ted Dunning <[email protected]> > wrote: > > The basic idea is very, very simple. You take an example, figure out a > > small change to the classifier that would make it do better for that > example > > and change the classifier a little bit in that direction. There are a > few > > tricks. > > The stochastic part is also straight-forward. The idea is that you think > of > > taking samples randomly from a distribution rather than from a set of > input > > examples. The practical effect is that you no longer think in terms of > > passes through the training data, but rather in terms of number of > examples > > seen. You can batch updates if you like, but the batch size is not > > determined by the number of examples you have lying around. This allows > > convergence in less than a single pass through the data (if the problem > is > > appropriate and the data large enough). > > A second wrinkle in MAHOUT-228 is the confidence weighted learning hack. > > The idea is that if you have a new training example that shows you need > to > > update the classifier that this new example is likely to have a > combination > > of features that you have seen many times and features that you have > rarely > > seen. For the features you have seen often, you probably don't want to > > learn much while for the features you have rarely seen, you probably want > to > > learn a bunch. In MAHOUT-228, I don't do the mathematically clever > update > > that Mark Dredze suggests. Instead, I just anneal the learning rate on > each > > term separately. The results are very impressive. This also takes care > of > > IDF weighting and stop lists. > > > > http://leon.bottou.org/projects/sgd > > > http://alias-i.com/lingpipe/demos/tutorial/logistic-regression/read-me.html > > http://videolectures.net/icml08_pereira_cwl/ > > > > On Mon, Jul 19, 2010 at 9:26 AM, Drew Farris <[email protected]> > wrote: > >> > >> I've spent a small amount of time with MAHOUT-228, enough to realize > >> that I need to understand more details of the SGD approach in addition > >> to diving into the code :) > > >
