It is hard to say, but if your skinny forms of the matrix will fit in
memory, then you will only need three or so map-reduce steps with no
iteration.  They should take about as long as your current iterations or a
little faster.

For the matrices quoted in the paper you cited, I think I could even do the
computation in R.

On Tue, Jul 26, 2011 at 12:08 AM, Sebastian Schelter <[email protected]> wrote:

> Hi Ted,
>
> can you estimate how big the performance/scalability gain of using random
> projection would be compared to using the standard approach of iterative
> matrix-vector multiplication?
>
> I already have working prototype code for the latter one, where I extend
> the approach to do a multiplication between the adjacency matrix and a
> matrix of steadystate probability vectors so several random walks can be
> done in parallel.
>
> If the gain would be huge I'd try to dig into random projection though.
>
> --sebastian
>
>
> On 25.07.2011 18:06, Ted Dunning wrote:
>
>> This is the same as solving this system:
>>
>>    (cM - I) r_k = (c-1) e_k
>>
>> To do that, you can use any decomposition technique to compute an
>> approximate functional inverse of (cM - I).
>>
>> SVD works so you can use the stochastic SVD algorithm that Dimitriy did.
>>  You probably can short-cut that substantially, however.
>>
>>       A = (cM - I)
>>
>>       Y = A \Omega
>>       Q_1 R_1 = Y
>>       B = Q_1^T A
>>
>> Normally, you compute the SVD decomposition of B here, but I think you
>> might be able to solve your problem directly at this point.  Of course,
>> the SVD decomposition of B is very cheap if it fits into memory.
>>
>> On Mon, Jul 25, 2011 at 8:34 AM, Sebastian Schelter <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>    I don't know as I'm not familiar with the technique of Random
>>    Projection to be honest. I know of a paper doing the walk in a
>>    lower-dimensional space:
>>    
>> http://www.cs.cmu.edu/~htong/_**_pdf/ICDM06_tong.pdf<http://www.cs.cmu.edu/~htong/__pdf/ICDM06_tong.pdf>
>>    
>> <http://www.cs.cmu.edu/~htong/**pdf/ICDM06_tong.pdf<http://www.cs.cmu.edu/~htong/pdf/ICDM06_tong.pdf>
>> >
>>
>>    I plan to base the implementation on the formula from the Pegasus
>>    paper, where one searches for the vector r_k that satisfies the
>>    following equation:
>>
>>    r_k = cMr_k + (1-c)e_k
>>
>>    with c being the probability not to teleport away from a reached
>>    node, M being the transposed, row-normalized adjacency matrix
>>    (similar to the one used in PageRank) and e_k being a vector with
>>    the same dimension as the number of vertices with the kth element
>>    being 1 and all others 0.
>>
>>    This formulation maps kinda nice to Map/Reduce as we simply need to
>>    create the adjacency matrix and then just need to iteratively
>>    multiply it by the steadystate probability vector (similar to
>>    PageRank). So we could include this algorithm without having to
>>    write a lot of custom code because we can express a lot of it
>>    through our already existing linear algebra code.
>>
>>    --sebastian
>>
>>
>>    On 25.07.2011 17 <tel:25.07.2011%2017>:17, Ted Dunning wrote:
>>
>>        Can this be done with a random projection?
>>
>>        On Mon, Jul 25, 2011 at 4:59 AM, Sebastian Schelter (JIRA)
>>        <[email protected] <mailto:[email protected]>>**wrote:
>>
>>
>>
>>                 [
>>            https://issues.apache.org/__**jira/browse/MAHOUT-773?page=__**
>> com.atlassian.jira.plugin.__**system.issuetabpanels:all-__**tabpanel<https://issues.apache.org/__jira/browse/MAHOUT-773?page=__com.atlassian.jira.plugin.__system.issuetabpanels:all-__tabpanel>
>>            <https://issues.apache.org/**jira/browse/MAHOUT-773?page=**
>> com.atlassian.jira.plugin.**system.issuetabpanels:all-**tabpanel<https://issues.apache.org/jira/browse/MAHOUT-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel>
>> >]
>>
>>            Sebastian Schelter updated MAHOUT-773:
>>            ------------------------------**__--------
>>
>>                Description:
>>            I'll create an implementation of Random Walk with Restarts
>>            as described in
>>            Kang, Tsourakakis, Faloutsos, "PEGASUS: A Peta-Scale Graph
>>            Mining System -
>>            Implementation and Observations"
>>            http://www.cs.cmu.edu/~__**christos/PUBLICATIONS/icdm09-_**
>> _pegasus.pdf<http://www.cs.cmu.edu/~__christos/PUBLICATIONS/icdm09-__pegasus.pdf>
>>            <http://www.cs.cmu.edu/~**christos/PUBLICATIONS/icdm09-**
>> pegasus.pdf<http://www.cs.cmu.edu/~christos/PUBLICATIONS/icdm09-pegasus.pdf>
>> >
>>
>>            The algorithm is a random walk similar to PageRank with the
>>            difference that
>>            you start at and teleport to a certain node. The
>>            probabilities it computes
>>            can be seen as a measure of proximity between the start node
>>            and a reached
>>            node. To my knowledge RWR can be e.g used for link
>>            predicition in social
>>            networks.
>>
>>            I will try to create an implementation that is able to do
>>            several walks in
>>            parallel and I will assume that a steadystate probability
>>            vector fits in
>>            memory.
>>
>>            I don't plan to use the implementation details from the
>>            paper but I'll
>>            model the algorithm as an iterative multiplication between
>>            the adjacency
>>            matrix of the graph and the matrix created from the
>>            steadystate probability
>>            vectors for the vertices we compute the random walks for.
>>
>>              was:
>>            I'll create an implementation of Random Walk with Restarts
>>            as described in
>>            Kang, Tsourakakis, Faloutsos, "PEGASUS: A Peta-Scale Graph
>>            Mining System -
>>            Implementation and Observations"
>>            http://www.cs.cmu.edu/~__**christos/PUBLICATIONS/icdm09-_**
>> _pegasus.pdf<http://www.cs.cmu.edu/~__christos/PUBLICATIONS/icdm09-__pegasus.pdf>
>>            <http://www.cs.cmu.edu/~**christos/PUBLICATIONS/icdm09-**
>> pegasus.pdf<http://www.cs.cmu.edu/~christos/PUBLICATIONS/icdm09-pegasus.pdf>
>> >
>>
>>            The algorithm is a random walk similar to PageRank with the
>>            difference that
>>            you start at and teleport to a certain node. The
>>            probabilities it computes
>>            can be seen as a measure of proximity between the start node
>>            and a reached
>>            node. To my knowledge RWR can be e.g used for link
>>            predicition in social
>>            networks.
>>
>>            I will try to create an implementation that is able to do
>>            several walks in
>>            parallel and I will assume that a steadystate probability
>>            vector fits in
>>            memory.
>>
>>
>>                Implement Random Walk with Restarts
>>                ------------------------------**__-----
>>
>>                                 Key: MAHOUT-773
>>                                 URL:
>>                
>> https://issues.apache.org/__**jira/browse/MAHOUT-773<https://issues.apache.org/__jira/browse/MAHOUT-773>
>>                
>> <https://issues.apache.org/**jira/browse/MAHOUT-773<https://issues.apache.org/jira/browse/MAHOUT-773>
>> >
>>                             Project: Mahout
>>                          Issue Type: New Feature
>>                          Components: Graph
>>                    Affects Versions: 0.6
>>                            Reporter: Sebastian Schelter
>>                            Assignee: Sebastian Schelter
>>
>>                I'll create an implementation of Random Walk with
>>                Restarts as described
>>
>>            in Kang, Tsourakakis, Faloutsos, "PEGASUS: A Peta-Scale
>>            Graph Mining System
>>            - Implementation and Observations"
>>            http://www.cs.cmu.edu/~__**christos/PUBLICATIONS/icdm09-_**
>> _pegasus.pdf<http://www.cs.cmu.edu/~__christos/PUBLICATIONS/icdm09-__pegasus.pdf>
>>            <http://www.cs.cmu.edu/~**christos/PUBLICATIONS/icdm09-**
>> pegasus.pdf<http://www.cs.cmu.edu/~christos/PUBLICATIONS/icdm09-pegasus.pdf>
>> >
>>
>>                The algorithm is a random walk similar to PageRank with
>>                the difference
>>
>>            that you start at and teleport to a certain node. The
>>            probabilities it
>>            computes can be seen as a measure of proximity between the
>>            start node and a
>>            reached node. To my knowledge RWR can be e.g used for link
>>            predicition in
>>            social networks.
>>
>>                I will try to create an implementation that is able to
>>                do several walks
>>
>>            in parallel and I will assume that a steadystate probability
>>            vector fits in
>>            memory.
>>
>>                I don't plan to use the implementation details from the
>>                paper but I'll
>>
>>            model the algorithm as an iterative multiplication between
>>            the adjacency
>>            matrix of the graph and the matrix created from the
>>            steadystate probability
>>            vectors for the vertices we compute the random walks for.
>>
>>            --
>>            This message is automatically generated by JIRA.
>>            For more information on JIRA, see:
>>            
>> http://www.atlassian.com/__**software/jira<http://www.atlassian.com/__software/jira>
>>            
>> <http://www.atlassian.com/**software/jira<http://www.atlassian.com/software/jira>
>> >
>>
>>
>>
>>
>>
>>
>>
>

Reply via email to