I found this paper on supervised lda. http://www.cs.princeton.edu/~blei/papers/BleiMcAuliffe2007.pdf
Could an implementation of supervised lda in your opinion be done easily given the already existent implementation? 2012/3/27 Dirk Weissenborn <[email protected]> > The fact that the dictionary stays fixed could be a problem for what I > want to do. Would it be hard to change it in the code? > > I now of one kind of supervised lda, called disclda... do you know others? > > If the model is slightly changed (training on just few new inputs compared > to the overall number of documents the model was already trained on), I > think retraining a classifier should converge fast, taking the old model > parameters as starting parameters, so I guess this probably would not be to > much of a drawback. > The reason I want to do this is, that lda works really good on sparse > datasets for training, and it has been shown that a composition of lda and > a classifier is much faster at the classification task, with similar > results to tf-idf based classifiers. > > 2012/3/27 Jake Mannix <[email protected]> > >> On Mon, Mar 26, 2012 at 5:13 PM, Dirk Weissenborn < >> [email protected]> wrote: >> >> > Ok thanks, >> > >> > maybe one question. Is it possible to train an already trained model on >> > just a few new documents with the provided algorithms or do you have to >> > train through the whole corpus again. What I mean is whether you can >> train >> > an lda model incrementally or not? >> > >> >> So the answer to *this* above question is that yes, you can incrementally >> train your LDA model with new documents, and in fact you can do this >> with the current codebase, although it's not really documented how you >> would >> do it: we currently persist a full-fledged model in between passes over >> the corpus (because we're doing MapReduce iteratively, this happens >> naturally), and this model doesn't "know" what documents were used to >> train it (its a big matrix of topic <-> feature counts, that's all), so >> you >> could >> take a fully trained model, and then run the current LDA over it, using >> new documents as input, and it will start learning from them. Now, it >> won't be able to learn with *new vocabulary* without changing the code >> a little, however. The dictionary stays fixed in the current codebase. >> >> >> > What I would actually like to do is training a topical classifier on >> top of >> > a lda model. Do you have any experience with that? I mean by changing >> the >> > lda model, the inputs for the classifier would also change. Do I have to >> > train a classifier from scratch again or can I reuse the classifier >> trained >> > on top of the older lda model and just ajust that one slightly? >> > >> >> This is actually a rather different question, it seems. Let me make sure >> I'm understanding what you're asking: >> >> You are using the p(topic | doc) values for each doc as *input features* >> for another classifier, right? Trying to update your model >> (which itself can be thought of as a fuzzy classifier consisting of >> weights >> p(feature | topic) and an inference algorithm to produce p(topic | doc) >> if fed a document consisting of a weighted feature vector) and keeping >> the p(topic | doc) vectors fixed will definitely be wrong - if the model >> changes, these weights will need to be updated. The only way >> to do this is to run these documents against the model in some way. >> >> But you don't really want to do that, you want to update your secondary >> classifier, after your input topics drift a bit upon having been updated. >> I think this problem still remains, however: your secondary classifier >> was trained on input features which have changed, so it most likely >> needs to be retrained as well. >> >> If, on the other hand, you are training a joint classifier (like >> Supervised LDA, or Labeled LDA), and you ran *this* in an online >> mode, you could probably update your classifier continually as you >> got new labeled training data to train on. But I'm speculating at this >> point. :) >> >> >> >> > >> > 2012/3/27 Dirk Weissenborn <[email protected]> >> > >> > > no problem. I ll post it >> > > >> > > >> > > 2012/3/27 Jake Mannix <[email protected]> >> > > >> > >> Hey Dirk, >> > >> >> > >> Do you mind continuing this discussion on the mailing list? Lots >> of >> > >> our users may ask this kind of question in the future... >> > >> >> > >> On Mon, Mar 26, 2012 at 3:36 PM, Dirk Weissenborn < >> > >> [email protected]> wrote: >> > >> >> > >>> Ok thanks, >> > >>> >> > >>> maybe one question. Is it possible to train an already trained >> model on >> > >>> just a few new documents with the provided algorithms or do you >> have to >> > >>> train through the whole corpus again. What I mean is whether you can >> > train >> > >>> an lda model incrementally or not? >> > >>> What I would actually like to do is training a topical classifier on >> > top >> > >>> of a lda model. Do you have any experience with that? I mean by >> > changing >> > >>> the lda model, the inputs for the classifier would also change. Do I >> > have >> > >>> to train a classifier from scratch again or can I reuse the >> classifier >> > >>> trained on top of the older lda model and just ajust that one? >> > >>> >> > >>> >> > >>> 2012/3/26 Jake Mannix <[email protected]> >> > >>> >> > >>>> On Mon, Mar 26, 2012 at 12:58 PM, Dirk Weissenborn < >> > >>>> [email protected]> wrote: >> > >>>> >> > >>>> > Thank you for the quick response! It is possible that I need it >> in >> > >>>> not too >> > >>>> > far future maybe I ll implement on top what already exists, which >> > >>>> should >> > >>>> > not be that hard as you mentioned. I ll provide a patch when the >> > time >> > >>>> > comes. >> > >>>> > >> > >>>> >> > >>>> Feel free to email any questions about using the >> > >>>> InMemoryCollapsedVariationalBayes0 >> > >>>> class - it's mainly been used for testing, so far, but if you want >> to >> > >>>> take >> > >>>> that class >> > >>>> and clean it up and look into fixing the online learning aspect of >> it, >> > >>>> that'd be >> > >>>> excellent. Let me know if you make any progress, because I'll >> > probably >> > >>>> be >> > >>>> looking >> > >>>> to work on this at some point as well, but I won't if you're >> already >> > >>>> working on it. :) >> > >>>> >> > >>>> >> > >>>> > >> > >>>> > 2012/3/26 Jake Mannix <[email protected]> >> > >>>> > >> > >>>> > > Hi Dirk, >> > >>>> > > >> > >>>> > > This has not been implemented in Mahout, but the version of >> > >>>> map-reduce >> > >>>> > > (batch)-learned >> > >>>> > > LDA which is done via (approximate+collapsed-) variational >> bayes >> > >>>> [1] is >> > >>>> > > reasonably easily >> > >>>> > > modifiable to the methods in this paper, as the LDA learner we >> > >>>> currently >> > >>>> > do >> > >>>> > > via iterative >> > >>>> > > MR passes is essentially an ensemble learner: each subset of >> the >> > >>>> data >> > >>>> > > partially trains a >> > >>>> > > full LDA model starting from the aggregate (summed) counts of >> all >> > >>>> of the >> > >>>> > > data from >> > >>>> > > previous iterations (see essentially the method named >> > "approximately >> > >>>> > > distributed LDA" / >> > >>>> > > AD-LDA in Ref-[2]). >> > >>>> > > >> > >>>> > > The method in the paper you refer to turns traditional VB (the >> > >>>> slower, >> > >>>> > > uncollapsed kind, >> > >>>> > > with the nasty digamma functions all over the place) into a >> > >>>> streaming >> > >>>> > > learner, by accreting >> > >>>> > > the word-counts of each document onto the model you're using >> for >> > >>>> > inference >> > >>>> > > on the next >> > >>>> > > documents. The same exact idea can be done on the CVB0 >> inference >> > >>>> > > technique, almost >> > >>>> > > without change - as VB differs from CVB0 only in the E-step, >> not >> > the >> > >>>> > > M-step. >> > >>>> > > >> > >>>> > > The problem which comes up when I've considered doing this >> kind >> > of >> > >>>> thing >> > >>>> > > in the past >> > >>>> > > is that if you do this in a distributed fashion, each member of >> > the >> > >>>> > > ensemble starts learning >> > >>>> > > different topics simultaneously, and then the merge gets >> trickier. >> > >>>> You >> > >>>> > can >> > >>>> > > avoid this by >> > >>>> > > doing some of the techniques mentioned in [2] for HDP, where >> you >> > >>>> swap >> > >>>> > > topic-ids on >> > >>>> > > merge to make sure they match up, but I haven't investigated >> that >> > >>>> very >> > >>>> > > thoroughly. The >> > >>>> > > other way to avoid this problem is to use the parameter denoted >> > >>>> \rho_t in >> > >>>> > > Hoffman et al - >> > >>>> > > this parameter is telling us how much to weight the model as it >> > was >> > >>>> up >> > >>>> > > until now, against >> > >>>> > > the updates from the latest document (alternatively, how much >> to >> > >>>> "decay" >> > >>>> > > previous >> > >>>> > > documents). If you don't let the topics drift *too much* >> during >> > >>>> parallel >> > >>>> > > learning, you could >> > >>>> > > probably make sure that they match up just fine on each merge, >> > while >> > >>>> > still >> > >>>> > > speeding up >> > >>>> > > the process faster than fully batch learning. >> > >>>> > > >> > >>>> > > So yeah, this is a great idea, but getting it to work in a >> > >>>> distributed >> > >>>> > > fashion is tricky. >> > >>>> > > In a non-distributed form, this idea is almost completely >> > >>>> implemented in >> > >>>> > > the class >> > >>>> > > InMemoryCollapsedVariationalBayes0. I say "almost" because >> it's >> > >>>> > > technically in >> > >>>> > > there already, as a parameter choice >> (initialModelCorpusFraction >> > != >> > >>>> 0), >> > >>>> > but >> > >>>> > > I don't >> > >>>> > > think it's working properly yet. If you're interested in the >> > >>>> problem, >> > >>>> > > playing with this >> > >>>> > > class would be a great place to start! >> > >>>> > > >> > >>>> > > Refrences: >> > >>>> > > 1) >> > >>>> > > >> > >>>> >> > >> http://eprints.pascal-network.org/archive/00006729/01/AsuWelSmy2009a.pdf >> > >>>> > > 2) http://www.csee.ogi.edu/~zak/cs506-pslc/dist_lda.pdf >> > >>>> > > >> > >>>> > > On Mon, Mar 26, 2012 at 11:54 AM, Dirk Weissenborn < >> > >>>> > > [email protected]> wrote: >> > >>>> > > >> > >>>> > > > Hello, >> > >>>> > > > >> > >>>> > > > I wanted to ask whether there is already an online learning >> > >>>> algorithm >> > >>>> > > > implementation for lda or not? >> > >>>> > > > >> > >>>> > > > >> > http://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf >> > >>>> > > > >> > >>>> > > > cheers, >> > >>>> > > > Dirk >> > >>>> > > > >> > >>>> > > >> > >>>> > > >> > >>>> > > >> > >>>> > > -- >> > >>>> > > >> > >>>> > > -jake >> > >>>> > > >> > >>>> > >> > >>>> >> > >>>> >> > >>>> >> > >>>> -- >> > >>>> >> > >>>> -jake >> > >>>> >> > >>> >> > >>> >> > >> >> > >> >> > >> -- >> > >> >> > >> -jake >> > >> >> > >> >> > > >> > >> >> >> >> -- >> >> -jake >> > >
