I think "just classifiers on just Hadoop" is enough for a project. I think "most ML on just Hadoop" is quite a big project, and even that hasn't nearly been completed here. I think "most ML on most platforms" is far too big in scope. It inevitably results in a village lightly-connected expeirments. Not bad, but not a project.
Understandably, there's virtually no attention to JIRAs here, and existing code, because it's mostly from someone else, structured differently, not very related to what anyone else knows. I think that is the downside saying scope is unlimited. I can't reliably consume this project if it looks like code just goes stale or has no relation to itself. That's the potential for harm -- nothing wrong with shiny new anything per se. Etc etc I'm sure this has been heard many times. Adding C++ or a completely different distributed environment seems to exacerbate that, when it seems more practical, easier and fun to contemplate new projects. That's why I say this. On Wed, Mar 13, 2013 at 7:00 PM, Dmitriy Lyubimov <[email protected]> wrote: > > Keep in mind. This discussion is not about new methods and bits. This > discussion is about new environments. And the motto of Mahout has been > declared to be conscious of big data but agnostic of environment. Are you > saying you are not in support of that statement? Why say SGD is deemed a > valuable contribution but adaptive ALS on spark would not? Neither relies > on Hadoop. What technically sets those choices so much apart? > >
