We should consider changing algorithms. MDCA is a good candidate. So would be nested Dirchlet processes. Neither of these is necessarily all that much more difficult to implement than PLSI and both should give better results.
On 4/15/08 12:52 PM, "Grant Ingersoll (JIRA)" <[EMAIL PROTECTED]> wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-31?page=com.atlassian.jira.plugin > .system.issuetabpanels:comment-tabpanel&focusedCommentId=12589206#action_12589 > 206 ] > > Grant Ingersoll commented on MAHOUT-31: > --------------------------------------- > > My bad, I thought there was a patch here. I just want to avoid the case of > someone who has knowledge that they think they are infringing and still puts > up a patch. > > So, in that case, I am fine if someone other than Ankur takes it up (or who > works with Ankur, I think). I just am a bit paranoid since we are so early > stage, I don't want anything to derail the positive momentum we have going > here. > >> Implementation of PLSI that uses EM >> ----------------------------------- >> >> Key: MAHOUT-31 >> URL: https://issues.apache.org/jira/browse/MAHOUT-31 >> Project: Mahout >> Issue Type: New Feature >> Reporter: Isabel Drost >> >> This should implement the proposal in the original Google Paper on PLSI in >> news retrieval.
