On 03/06/2015 12:44 PM, Pat Ferrel wrote:
This is great.
So we’ve talked about a name change and shortly we’ll be forced to come up with
something the describes what Mahout has become. Most past users think of it as
a scalable ML library on Hadoop. That may describe Mahout-Legacy but it seems
like we need a name for the Scala DSL/Spark/other? part of the project. Lots of
projects have sub-projects so we know there is no issue with naming
sub-projects. So my question to everyone is:
Should (or can) the Top Level Project be renamed? If so to what?
I don't like the idea of a top level name change. I think that it would
be a much better idea to direct our resources at polishing and
developing what we have now. As well, especially for this release, I
think that it would do a disservice to the "legacy" components (which as
you point out have not been deprecated) with ~45 completed bugfixes and
several more in the pipe.
If we don’t rename the TLP then what should we call legacy (not very appealing)
and scala/DSL (not a name really)
agreed. Legacy is not the most appealing name. Maybe something like
Mahout-MapReduce? Though that could cause some confusion regarding the
"no new MapReduce code"
My opinion:
Since we are deemphasizing legacy I’m not sure there is a need to call
attention to it by giving it a subproject name. However it is not deprecated so
we need to include it in releases and even fix the minimum of critical bugs for
some time to come.
agreed regarding fixing critical legacy bugs. Looking through the
issues last night there didn't seem to me a lot of critical bugs, and
probably a good amount of issues can be closed out as wont fix/not an
issue.
Mahout is getting beat up in the circles of those who talk about such things
and much of this is because people don’t understand what it has become.
Therefore I’d like to see a project rename to reset expectations. Leave the
name Mahout for legacy stuff and give a new name to the Scala environment.
Split the builds and create new docs for the Scala stuff. This would seem to
make it easier to document since legacy is most of what the CMS documents, we
could create whole new template for the new project name.
What is the upside to splitting the builds? I'm not against it- I'm just
not sure I understand.
Failing this, many of the same benefits could be gained by creating legacy and
scala sub-projects with better names. This I know we can do and recall that
things like MLlib are generally not tied to Spark when speaking about them. So
a subproject could have very much its own identity.
Looking at the long history of Mahout it seems like the current generality was
hard gained through implementing many special purpose algorithms, some of which
were grad student projects. This is where MLlib is today in some ways. So a
general framework and environment makes a lot of sense as the evolution of
Mahout. Let’s give it a name, something better than DSL.
I think that a pretty clear description of what the other side of the
project is has been emerging recently. IMO We need to start getting it
out there. Probably a good start would be to update the front page of
the mahout site.
I don't have any good ideas regarding names for this side of the project.
On Mar 5, 2015, at 7:43 PM, Andrew Musselman <[email protected]> wrote:
Thanks AP
On Thursday, March 5, 2015, Andrew Palumbo <[email protected]> wrote:
I went through all of the unresolved JIRA issues and marked all with at
least a "legacy" or "scala". (for lack of a better name for all that is not
legacy) label. Hopefully I got them all.
Some are labelled with both (math, build, documentation related to both or
neither, etc.)
legacy issues:
https://issues.apache.org/jira/browse/MAHOUT-1522?jql=
project%20%3D%20MAHOUT%20AND%20resolution%20%3D%
20Unresolved%20AND%20labels%20%3D%20scala%20ORDER%20BY%20priority%20DESC
"scala" issues:
https://issues.apache.org/jira/browse/MAHOUT-1522?jql=
project%20%3D%20MAHOUT%20AND%20resolution%20%3D%
20Unresolved%20AND%20labels%20%3D%20legacy%20ORDER%20BY%20priority%20DESC
Hopefully this will help us get started closing up some old issues. I'll
try to make another pass over them and close tomorrow and try to find some
that need to be closed out.