[
https://issues.apache.org/jira/browse/MAHOUT-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837256#comment-15837256
]
ASF GitHub Bot commented on MAHOUT-1856:
----------------------------------------
Github user rawkintrevo commented on a diff in the pull request:
https://github.com/apache/mahout/pull/246#discussion_r97715579
--- Diff:
math-scala/src/main/scala/org/apache/mahout/math/algorithms/regression/Regressor.scala
---
@@ -0,0 +1,49 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.mahout.math.algorithms.regression
+
+import org.apache.mahout.math.algorithms.regression.tests._
+import org.apache.mahout.math.drm._
+import org.apache.mahout.math.{Vector => MahoutVector}
+import org.apache.mahout.math.algorithms.Model
+import org.apache.mahout.math.drm.DrmLike
+
+import scala.reflect.ClassTag
+
+/**
+ * Abstract of Regressors
+ */
+trait Regressor[K] extends Model {
--- End diff --
I agree that Spark and Flink follow the paradigm you are suggesting,
however sklearn doesn't. If we're just going off of what others do, following
the other larger packages- yea we should probably follow conventions of what
other scala based "Big Data" packages do. However, I can't understand WHY they
do it that way- it makes the code hard to read/follow and I assume is an
artifact of all the serialization and the way they execute models (having to
ship object around for map / reduce phases), that is to say they do it because
_they are forced to_ and _at the expense of_ readability.
In Mahout, most of that is taken care of at the distributed engine level.
If we start going down the rabbit hole of "do as Spark and Flink do" we may
find ourselves with [entire class just for the summary of a linear
model](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L620).
I for one, want to stay as far away from that as possible. I'd like to see
algorithm code (these and future) be as succinct and tractable as possible so
that
1. New contributors aren't intimidated (that is to say are encouraged to
commit algorithms)
2. Those algorithms can be easily reviewed and maintained with minimal
Scala knowledge (as it limits the pool of willing and able contributors who
understand the actual math at play)
That isn't to say, at the end of the day, your proposal is incorrect- you
usually are correct and I value and appreciate you taking the time to review.
I am saying, " i think that's how this pattern goes in most kits." is neither
necessary nor sufficient imo, as in some respects I'm explicitly trying to
avoid the approach of other packages, in this case- refactoring something to be
more complex with no clear understanding of the benefit.
> Create a framework for new Mahout Clustering, Classification, and
> Optimization Algorithms
> ------------------------------------------------------------------------------------------
>
> Key: MAHOUT-1856
> URL: https://issues.apache.org/jira/browse/MAHOUT-1856
> Project: Mahout
> Issue Type: New Feature
> Affects Versions: 0.12.1
> Reporter: Andrew Palumbo
> Assignee: Trevor Grant
> Priority: Critical
> Fix For: 0.13.0
>
>
> To ensure that Mahout does not become "A loose bag of algorithms", Create
> basic traits with funtions common to each class of algorithm.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)