[
https://issues.apache.org/jira/browse/MAHOUT-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015639#comment-14015639
]
Dmitriy Lyubimov edited comment on MAHOUT-1365 at 6/2/14 5:49 PM:
------------------------------------------------------------------
[~ssc] Since you've done this before, can you please eyeball this and make a
suggestion ?
my current implementation proceeds with computations based on formula (7) in
the pdf which is in its turn is derived directly from both papers. (we ignore
baseline confidence which i denote as c_0 in which case the expression under
inversion comes apart as V'V which is common, tiny for all item vectors so it
is just computed once and broadcasted; and then individual item correction
U'D^(i)U which takes only rows of U where confidence is non-trivial (c!= c_0).
That kind of means that every U row has to send a message to every V for which
where c!= c_0. I previously have done it with pregel. It turns out, in spark
Bagel is a moot point since it is simply using groupBy underneath rather than a
custom multicast communication. Still though, if i did it today, I would have
to do a coGroup or something to achieve similar effect. Question is if there's
a neat way to translate it into our current set of linear algebra primitives,
or that's it, it would be our first case when we would have to create our first
method that in part would be tightly coupled to Spark? Any thoughts?
was (Author: dlyubimov):
[~ssc] Since you've done you before, can you please eyeball this and make a
suggestion ?
my current implementation proceeds with computations based on formula (7) in
the pdf which is in its turn is derived directly from both papers. (we ignore
baseline confidence which i denote as c_0 in which case the expression under
inversion comes apart as V'V which is common, tiny for all item vectors so it
is just computed once and broadcasted; and then individual item correction
U'D^(i)U which takes only rows of U where confidence is non-trivial (c!= c_0).
That kind of means that every U row has to send a message to every V for which
where c!= c_0. I previously have done it with pregel. It turns out, in spark
Bagel is a moot point since it is simply using groupBy underneath rather than a
custom multicast communication. Still though, if i did it today, I would have
to do a coGroup or something to achieve similar effect. Question is if there's
a neat way to translate it into our current set of linear algebra primitives,
or that's it, it would be our first case when we would have to create our first
method that in part would be tightly coupled to Spark? Any thoughts?
> Weighted ALS-WR iterator for Spark
> ----------------------------------
>
> Key: MAHOUT-1365
> URL: https://issues.apache.org/jira/browse/MAHOUT-1365
> Project: Mahout
> Issue Type: Task
> Reporter: Dmitriy Lyubimov
> Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
> Attachments: distributed-als-with-confidence.pdf
>
>
> Given preference P and confidence C distributed sparse matrices, compute
> ALS-WR solution for implicit feedback (Spark Bagel version).
> Following Hu-Koren-Volynsky method (stripping off any concrete methodology to
> build C matrix), with parameterized test for convergence.
> The computational scheme is following ALS-WR method (which should be slightly
> more efficient for sparser inputs).
> The best performance will be achieved if non-sparse anomalies prefilitered
> (eliminated) (such as an anomalously active user which doesn't represent
> typical user anyway).
> the work is going here
> https://github.com/dlyubimov/mahout-commits/tree/dev-0.9.x-scala. I am
> porting away our (A1) implementation so there are a few issues associated
> with that.
--
This message was sent by Atlassian JIRA
(v6.2#6252)