Re: Row Similarity

Reza Zadeh Wed, 10 Dec 2014 18:19:47 -0800

Here we go: https://issues.apache.org/jira/browse/SPARK-4823


On Wed, Dec 10, 2014 at 9:01 PM, Debasish Das <debasish.da...@gmail.com>
wrote:

> I added code to compute topK products for each user and topK user for each
> product in SPARK-3066..
>
> That is different than row similarity calculation as we need both user and
> product factors to calculate the topK recommendations..
>
> For (1) and (2) we are trying to answer similarUsers to given a user and
> similarProducts to a given product....
>
> similarProducts to a given product is straightforward to compute through
> columnSimilarities/dimsum when products are skinny...
>
> similarUser to a given user will need a map-reduce implementation of row
> similarity since the matrix is tall...
>
> I don't see a JIRA for that yet...Are there any good reference for map
> reduce implementation of row similarity ?
>
> On Wed, Dec 10, 2014 at 2:30 PM, Reza Zadeh <r...@databricks.com> wrote:
>
>> It's not so cheap to compute row similarities when there are many rows,
>> as it amounts to computing the outer product of a matrix A (i.e. computing
>> AA^T, which is expensive).
>>
>> There is a JIRA to track handling (1) and (2) more efficiently than
>> computing all pairs: https://issues.apache.org/jira/browse/SPARK-3066
>>
>>
>>
>> On Wed, Dec 10, 2014 at 2:44 PM, Debasish Das <debasish.da...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> It seems there are multiple places where we would like to compute row
>>> similarity (accurate or approximate similarities)
>>>
>>> Basically through RowMatrix columnSimilarities we can compute column
>>> similarities of a tall skinny matrix
>>>
>>> Similarly we should have an API in RowMatrix called rowSimilarities where
>>> we can compute similar rows in a map-reduce fashion. It will be useful
>>> for
>>> following use-cases:
>>>
>>> 1. Generate topK users for each user from matrix factorization model
>>> 2. Generate topK products for each product from matrix factorization
>>> model
>>> 3. Generate kernel matrix for use in spectral clustering
>>> 4. Generate kernel matrix for use in kernel regression/classification
>>>
>>> I am not sure if there are already good implementation for map-reduce row
>>> similarity that we can use (ideas like fastfood and kitchen sink felt
>>> more
>>> like for classification use-case but for recommendation also user
>>> similarities show up which is unsupervised)...
>>>
>>> Is there a JIRA tracking it ? If not I can open one and we can discuss
>>> further on it.
>>>
>>> Thanks.
>>> Deb
>>>
>>
>>
>

Re: Row Similarity

Reply via email to