On Wed, Mar 13, 2013 at 5:39 AM, Sean Owen <[email protected]> wrote: > On Wed, Mar 13, 2013 at 12:06 PM, Dmitriy Lyubimov <[email protected] > >wrote: > > > Also. I still have an impression as i mentioned that adaptive version of > > algorithm is not available and specifying lambda for als-wr is left to > > operator's intuition? This is probably a bigger issue even than the > > > > (What's the adaptive version? I don't know of an implementation that > dynamically chooses lambda, but you can always choose it with > cross-validation. And that could be done in-line with iterations I guess. ) >
MM.. sort of. The way i understand the purpose of crossvalidation is to find expectation of a cost function (error), cause we of course are not interested just in the point of estimate, such estimate would exhibit way too big standard error itself to be reliable. We do run K folds of training but parameters of the training stay unchanged. So we need some sort of search for argmin of the cost given lambda. Indeed, some versions of R algorithms do it "fold" style, where they train for say 20 different lambdas on exponential scale given reasonable bounds and then pick the one that resulted in a best cost expectation. A slightly less computationally intensive approach, as Rafael has suggested, was to do iterative improvement based on much fewer estimates (say 3) fit into second degree curve with single maximum as the best guess for the next iteration. That would provide significantly less total flops requirement with significant precision benefits. But again, you have k-fold runs each requiring quite a bit iterations for als itself to converge (say 20) multiplied sequentially on number of search for lambda. (btw N of iterations is in itself ideally parameter to optimize since less than something will result in unacceptable underfit and muddy the waters even more -- but luckily this has monotinic effect on cost so we generally ignore this). Next, we discussed, well maybe we could boostrap lambda on a subset. Ted said that this approach had unsurmountable problems: optimum would not be the same and projection for a large dataset is complicated. This is also a reason why it cannot be done "inline" -- you need to run thru entire dataset to get reliable cost estimate given certain lambda. That was it. Brute force adaptivity was thought to be costly multiplication of flops and thus was not implemented, iterative was even less approachable for the reason of explosive growth of iteration count. So it'd been left at that. Or so is my interpretation. Please correct if I am wrong. One way or another, i was under impression current version was forcing my hand in manually managing bisect-like search for lambda optimum. > > > > pieces we know about, with some sweat and tears could be solved with a > more > > constraining technology B as well as more naturally with its superset > > technology A, what is the merit of making such choice in favor of B, > > debatable maturity issues of either of choices aside? > > > > I'm speaking for myself but the huge reason is that technology B is widely > used and mature, and rightly or wrongly in demand, and customers are trying > to make use of idle resources exposed via B. If using A is only easier for > the product developer, that's great (and going to lead to better results > long-term) but not something the customer is interested in. I say > "customer" but this goes for consumers of open source code. > In other words, maturity arguments. As I said, already considered accepted those. Maturity arguments are debatable though. Maturity of a product doesn't seem necessarily to always improve by age. In fact 0.18-0.20 revisions in my stats were much more accident free than CDH3 and on. Mapr built a business around fixing operational issues in the Apache version. At some point tasktrackers in cdh3 had a memory leak and we had to round-boot it every couple days or so. Just the last long weekend our sysops were firehosing namenode problems again. Yarn is fresh out of the oven. Need i say more. I accept maturity argument in a sense that hadoop is _commercially_ mature (having specifically MapR distro in mind). or as EMR. As a long term production grade redundant store, well, i guess we can agree to disagree here:) Customer affinity argument is like a lot of other arguments presented, goes along the lines "it covers some problems". Doesn't mean it covers close to 100%. Not the case here. Not the case there. > > > And finally, on the side of pragmatic project management, why even to > > artificially favor either of choices if we only rely on non commericial > > conributions? why do we even want to oppose any diversification attempts > on > > any ground as long as we manage it incubator style along with > established > > safe graduation policies to ensure chaos control? Viable things will find > > their use and adoption. (Well, maybe i am a little bit optimistic here. > > Nonviable tech seems to be striving for years as well just on the pitch > > alone). If they dont find their way into Mahout, they will eventually > > flourish elsewhere (assuming their viability). > > > > I think this leads to a jumble of half-baked code. A playground of bits of > code is fine, but, why push it together into a project that implies it's > going to be coherent, supported? Sorry, i never suggested that. Just collaborate on Github. > > Any effort is just tacking on more bits and > pieces that are ever less related to the other bits. This is excellent -- > on Github. Why not stick a fork in it? > Ah. but that's exactly what i meant by "incubator mode". Github, or contrib project, whichever. Except i am saying we probably will be better off by encoraging and helping new contributors to do so by suggesting best practices and advising on approaches. By being aware where and what they do. By communicating to them. By exploring mutual interests. Exchanging ideas. Put the mainline merging criteria forward. Maybe some of these will merge it into mainstream if value is demonstrated. Who knows. Doing crunch adaptations? Cool! Github/sidekick project. Show what you mean there. Doing scala mapping? Yay! show us what you mean. Etc. Keep in mind. This discussion is not about new methods and bits. This discussion is about new environments. And the motto of Mahout has been declared to be conscious of big data but agnostic of environment. Are you saying you are not in support of that statement? Why say SGD is deemed a valuable contribution but adaptive ALS on spark would not? Neither relies on Hadoop. What technically sets those choices so much apart? Ok i probably already laid out my case.
