Re: Discussion Of ML environment/MR, Mahout

Sean Owen Wed, 13 Mar 2013 12:55:34 -0700

I think "just classifiers on just Hadoop" is enough for a project. I think
"most ML on just Hadoop" is quite a big project, and even that hasn't
nearly been completed here. I think "most ML on most platforms" is far too
big in scope. It inevitably results in a village lightly-connected
expeirments. Not bad, but not a project.

Understandably, there's virtually no attention to JIRAs here, and existing
code, because it's mostly from someone else, structured differently, not
very related to what anyone else knows. I think that is the downside saying
scope is unlimited. I can't reliably consume this project if it looks like
code just goes stale or has no relation to itself. That's the potential for
harm -- nothing wrong with shiny new anything per se. Etc etc I'm sure this
has been heard many times.

Adding C++ or a completely different distributed environment seems to
exacerbate that, when it seems more practical, easier and fun to
contemplate new projects. That's why I say this.

On Wed, Mar 13, 2013 at 7:00 PM, Dmitriy Lyubimov <[email protected]> wrote:
>
> Keep in mind. This discussion is not about new methods and bits. This
> discussion is about new environments. And the motto of Mahout has been
> declared to be conscious of big data but agnostic of environment. Are you
> saying you are not in support of that statement? Why say SGD is deemed a
> valuable contribution but adaptive ALS on spark would not? Neither relies
> on Hadoop. What technically sets those choices so much apart?
>
>

Re: Discussion Of ML environment/MR, Mahout

Reply via email to