Hey Sean,

I hear what you are saying, I've been working the RF classifiers and the community/code could use a little more cohesion.

Having a "most ML on most platforms" would be a good thing. You point out valid organizational hurdles. Are there organizational changes that could be made to support a larger project scope? Large scale project do get done.

Marty


On 03/13/2013 01:55 PM, Sean Owen wrote:
I think "just classifiers on just Hadoop" is enough for a project. I think
"most ML on just Hadoop" is quite a big project, and even that hasn't
nearly been completed here. I think "most ML on most platforms" is far too
big in scope. It inevitably results in a village lightly-connected
expeirments. Not bad, but not a project.

Understandably, there's virtually no attention to JIRAs here, and existing
code, because it's mostly from someone else, structured differently, not
very related to what anyone else knows. I think that is the downside saying
scope is unlimited. I can't reliably consume this project if it looks like
code just goes stale or has no relation to itself. That's the potential for
harm -- nothing wrong with shiny new anything per se. Etc etc I'm sure this
has been heard many times.

Adding C++ or a completely different distributed environment seems to
exacerbate that, when it seems more practical, easier and fun to
contemplate new projects. That's why I say this.


On Wed, Mar 13, 2013 at 7:00 PM, Dmitriy Lyubimov <[email protected]> wrote:
Keep in mind. This discussion is not about new methods and bits. This
discussion is about new environments. And the motto of Mahout has been
declared to be conscious of big data but agnostic of environment. Are you
saying you are not in support of that statement? Why say SGD is deemed a
valuable contribution but adaptive ALS on spark would not? Neither relies
on Hadoop. What technically sets those choices so much apart?



Reply via email to