On Tue, Nov 12, 2013 at 1:18 PM, Dmitriy Lyubimov (JIRA) <[email protected]>wrote:
> > [ > https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820475#comment-13820475] > > Dmitriy Lyubimov commented on MAHOUT-1346: > ------------------------------------------ > > https://github.com/dlyubimov/mahout-commits/tree/dev-0.9.x-scala > > i started moving some things there. In particular, ALS is still not there > (still haven't hashed it out with my boss). but there some inital matrix > algorithms to be picked up (even transposition can be blockified and > improved). > > Anyone wanting to give me a hand on this? > > Please dont pick weighted ALS-WR so far, i still hope to finish porting it. > > There are more interesting questions there, like parameter validation and > fitting. > Common problem i have is that suppose you have the implicit feedback > approach. Then you reformulate it in terms of preference (P) and confidence > (C) inputs. The original paper speaks of a specific scheme of forming C > that includes one parameter they want to fit. > > More interesting question is, what if we have more than one parameter? > I.e. what if we have a bunch of user behavior, suppose, an item search, > browse, click, add2card, and finally, aquisition. That's a whole bunch of > parameter over user's preference. suppose we want to perform exploration > what's worth what. Natural way is to do it, again, thru a crossvalidation . > > However, since there are many parameters, the task becomes fairly less > interesting. since there is not so much test data (we still should assume > we will have just a handful of crossvalidation runs) various "online" > convex searching techniques like SGD or BFGS are not going to be very > viable. what i was thinking of, maybe we can start runnig parallel tries > and fit the data into paraboloids (i.e. second degree polynomial regression > without interaction terms). That might be a big assumption but that would > be enough. Of course we may discover parabaloid properties Sorry, i meant "hyperboloid", or perhaps "hyperbolic paraloid" properties here, a proper term would be > along some parameter axes. in which case it would mean we got the > preference wrong, so we flip the preference mapping. (i.e. click = (P=1, > C=0.5) would flip into click = (P=0, C=0...) and re-validate again. This > is kind of multidimensional variation of one-parameter second degree > polynom fitting that Raphael refered to once. > > We are taking on a lot of assumptions here (parameter independence, > existence of a good global maximum etc. etc). Perhaps there's something > better to automate that search? > > thanks . > -Dmitriy > > > Spark Bindings (DRM) > > -------------------- > > > > Key: MAHOUT-1346 > > URL: https://issues.apache.org/jira/browse/MAHOUT-1346 > > Project: Mahout > > Issue Type: Improvement > > Affects Versions: 0.8 > > Reporter: Dmitriy Lyubimov > > Assignee: Dmitriy Lyubimov > > Fix For: Backlog > > > > > > Spark bindings for Mahout DRM. > > DRM DSL. > > Disclaimer. This will all be experimental at this point. > > The idea is to wrap DRM by Spark RDD with support of some basic > functionality, perhaps some humble beginning of Cost-based optimizer > > (0) Spark serialization support for Vector, Matrix > > (1) Bagel transposition > > (2) slim X'X > > (2a) not-so-slim X'X > > (3) blockify() (compose RDD containing vertical blocks of original input) > > (4) read/write Mahout DRM off HDFS > > (5) A'B > > ... > > > > -- > This message was sent by Atlassian JIRA > (v6.1#6144) >
