Ankur,

You might like to take a quick look at the following two papers which
provide a strong extension to PLSI,

www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf
cosco.hiit.fi/Articles/buntineBohinj.pdf

The Buntine/Jakulin paper especially provides a relatively simple algorithm
that has significant advantages over simple pLSI and which would be quite
amenable to parallelization in the style of your EM work.


On 3/10/08 1:01 AM, "Ankur (JIRA)" <[EMAIL PROTECTED]> wrote:

> 
>     [ 
> https://issues.apache.org/jira/browse/MAHOUT-4?page=com.atlassian.jira.plugin.
> system.issuetabpanels:comment-tabpanel&focusedCommentId=12576890#action_125768
> 90 ] 
> 
> Ankur commented on MAHOUT-4:
> ----------------------------
> 
> Thanks for your comment. A few of my replies below:-
>> Maybe you might ..
> Will make these changes in the next patch update.
> 
>> ... - how many cluster numbers do you expect ...?
> Well typically I would expect a user:cluster ratio of 1000:1. So for 1 million
> users, 1000 clusters would be created.
> 
> In main method, a sample user-story matrix is provided which can be changed to
> experiment. However if required I can write a small unit test case to produce
> randomnly generated user-story matrix but am not sure if that will help
> better.
> 
>> I know EM as ...
> I like the idea of general EM framework. Will definitely try to change the
> code so that it reflect EM more generically as suggested.
> 
> 
> 
>> Simple prototype for Expectation Maximization (EM)
>> --------------------------------------------------
>> 
>>                 Key: MAHOUT-4
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-4
>>             Project: Mahout
>>          Issue Type: New Feature
>>            Reporter: Ankur
>>         Attachments: Mahout_EM.patch
>> 
>> 
>> Create a simple prototype implementing Expectation Maximization - EM that
>> demonstrates the algorithm functionality given a set of (user, click-url)
>> data.
>> The prototype should be functionally complete and should serve as a basis for
>> the Map-Reduce version of the EM algorithm.

Reply via email to