Re: [jira] [Commented] (MAHOUT-1346) Spark Bindings (DRM)

Dmitriy Lyubimov Tue, 12 Nov 2013 13:21:34 -0800

On Tue, Nov 12, 2013 at 1:18 PM, Dmitriy Lyubimov (JIRA) <[email protected]>wrote:


>
>     [
> https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820475#comment-13820475]
>
> Dmitriy Lyubimov commented on MAHOUT-1346:
> ------------------------------------------
>
> https://github.com/dlyubimov/mahout-commits/tree/dev-0.9.x-scala
>
> i started moving some things there. In particular, ALS is still not there
> (still haven't hashed it out with my boss). but there some inital matrix
> algorithms to be picked up (even transposition can be blockified and
> improved).
>
> Anyone wanting to give me a hand on this?
>
> Please dont pick weighted ALS-WR so far, i still hope to finish porting it.
>
> There are more interesting questions there, like parameter validation and
> fitting.
> Common problem i have is that suppose you have the implicit feedback
> approach. Then you reformulate it in terms of preference (P) and confidence
> (C) inputs. The original paper speaks of a specific scheme of forming C
> that includes one parameter they want to fit.
>
> More interesting question is, what if we have more than one parameter?
> I.e. what if we have a bunch of user behavior, suppose, an item search,
> browse, click, add2card, and finally, aquisition. That's a whole bunch of
> parameter over user's preference. suppose we want to perform exploration
> what's worth what. Natural way is to do it, again, thru a crossvalidation .
>
> However, since there are many parameters, the task becomes fairly less
> interesting. since there is not  so much test data (we still should assume
> we will have just a handful of crossvalidation runs) various "online"
> convex searching techniques like SGD or BFGS are not going to be very
> viable. what i was thinking of, maybe we can start runnig parallel tries
> and fit the data into paraboloids (i.e. second degree polynomial regression
> without interaction terms). That might be a big assumption but that would
> be enough. Of course we may discover parabaloid properties


Sorry, i meant "hyperboloid", or perhaps "hyperbolic paraloid" properties
here, a proper term would be


> along some parameter axes. in which case it would mean we got the
> preference wrong, so we flip the preference mapping. (i.e. click = (P=1,
> C=0.5) would flip into click = (P=0, C=0...) and re-validate again.  This
> is kind of multidimensional variation of one-parameter second degree
> polynom fitting that Raphael refered to once.
>
> We are taking on a lot of assumptions here (parameter independence,
> existence of a good global maximum etc. etc). Perhaps there's something
> better to automate that search?
>
> thanks .
> -Dmitriy
>
> > Spark Bindings (DRM)
> > --------------------
> >
> >                 Key: MAHOUT-1346
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-1346
> >             Project: Mahout
> >          Issue Type: Improvement
> >    Affects Versions: 0.8
> >            Reporter: Dmitriy Lyubimov
> >            Assignee: Dmitriy Lyubimov
> >             Fix For: Backlog
> >
> >
> > Spark bindings for Mahout DRM.
> > DRM DSL.
> > Disclaimer. This will all be experimental at this point.
> > The idea is to wrap DRM by Spark RDD with support of some basic
> functionality, perhaps some humble beginning of Cost-based optimizer
> > (0) Spark serialization support for Vector, Matrix
> > (1) Bagel transposition
> > (2) slim X'X
> > (2a) not-so-slim X'X
> > (3) blockify() (compose RDD containing vertical blocks of original input)
> > (4) read/write Mahout DRM off HDFS
> > (5) A'B
> > ...
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.1#6144)
>

Re: [jira] [Commented] (MAHOUT-1346) Spark Bindings (DRM)

Reply via email to