Ankur, You might like to take a quick look at the following two papers which provide a strong extension to PLSI,
www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf cosco.hiit.fi/Articles/buntineBohinj.pdf The Buntine/Jakulin paper especially provides a relatively simple algorithm that has significant advantages over simple pLSI and which would be quite amenable to parallelization in the style of your EM work. On 3/10/08 1:01 AM, "Ankur (JIRA)" <[EMAIL PROTECTED]> wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-4?page=com.atlassian.jira.plugin. > system.issuetabpanels:comment-tabpanel&focusedCommentId=12576890#action_125768 > 90 ] > > Ankur commented on MAHOUT-4: > ---------------------------- > > Thanks for your comment. A few of my replies below:- >> Maybe you might .. > Will make these changes in the next patch update. > >> ... - how many cluster numbers do you expect ...? > Well typically I would expect a user:cluster ratio of 1000:1. So for 1 million > users, 1000 clusters would be created. > > In main method, a sample user-story matrix is provided which can be changed to > experiment. However if required I can write a small unit test case to produce > randomnly generated user-story matrix but am not sure if that will help > better. > >> I know EM as ... > I like the idea of general EM framework. Will definitely try to change the > code so that it reflect EM more generically as suggested. > > > >> Simple prototype for Expectation Maximization (EM) >> -------------------------------------------------- >> >> Key: MAHOUT-4 >> URL: https://issues.apache.org/jira/browse/MAHOUT-4 >> Project: Mahout >> Issue Type: New Feature >> Reporter: Ankur >> Attachments: Mahout_EM.patch >> >> >> Create a simple prototype implementing Expectation Maximization - EM that >> demonstrates the algorithm functionality given a set of (user, click-url) >> data. >> The prototype should be functionally complete and should serve as a basis for >> the Map-Reduce version of the EM algorithm.
