[jira] [Commented] (MAHOUT-1856) Create a framework for new Mahout Clustering, Classification, and Optimization Algorithms

ASF GitHub Bot (JIRA) Tue, 24 Jan 2017 22:18:04 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837256#comment-15837256
 ]


ASF GitHub Bot commented on MAHOUT-1856:
----------------------------------------

Github user rawkintrevo commented on a diff in the pull request:

    https://github.com/apache/mahout/pull/246#discussion_r97715579
  
    --- Diff: 
math-scala/src/main/scala/org/apache/mahout/math/algorithms/regression/Regressor.scala
 ---
    @@ -0,0 +1,49 @@
    +/**
    +  * Licensed to the Apache Software Foundation (ASF) under one
    +  * or more contributor license agreements. See the NOTICE file
    +  * distributed with this work for additional information
    +  * regarding copyright ownership. The ASF licenses this file
    +  * to you under the Apache License, Version 2.0 (the
    +  * "License"); you may not use this file except in compliance
    +  * with the License. You may obtain a copy of the License at
    +  *
    +  * http://www.apache.org/licenses/LICENSE-2.0
    +  *
    +  * Unless required by applicable law or agreed to in writing,
    +  * software distributed under the License is distributed on an
    +  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +  * KIND, either express or implied. See the License for the
    +  * specific language governing permissions and limitations
    +  * under the License.
    +  */
    +
    +package org.apache.mahout.math.algorithms.regression
    +
    +import org.apache.mahout.math.algorithms.regression.tests._
    +import org.apache.mahout.math.drm._
    +import org.apache.mahout.math.{Vector => MahoutVector}
    +import org.apache.mahout.math.algorithms.Model
    +import org.apache.mahout.math.drm.DrmLike
    +
    +import scala.reflect.ClassTag
    +
    +/**
    +  * Abstract of Regressors
    +  */
    +trait Regressor[K] extends Model {
    --- End diff --
    
    I agree that Spark and Flink follow the paradigm you are suggesting, 
however sklearn doesn't.  If we're just going off of what others do, following 
the other larger packages- yea we should probably follow conventions of what 
other scala based "Big Data" packages do.  However, I can't understand WHY they 
do it that way- it makes the code hard to read/follow and I assume is an 
artifact of all the serialization and the way they execute models (having to 
ship object around for map / reduce phases), that is to say they do it because 
_they are forced to_ and _at the expense of_ readability. 
    
    In Mahout, most of that is taken care of at the distributed engine level.  
    
    If we start going down the rabbit hole of "do as Spark and Flink do" we may 
find ourselves with [entire class just for the summary of a linear 
model](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L620).
  I for one, want to stay as far away from that as possible.  I'd like to see 
algorithm code (these and future) be as succinct and tractable as possible so 
that 
    
    1. New contributors aren't intimidated (that is to say are encouraged to 
commit algorithms)
    2. Those algorithms can be easily reviewed and maintained with minimal 
Scala knowledge (as it limits the pool of willing and able contributors who 
understand the actual math at play)
    
    That isn't to say, at the end of the day, your proposal is incorrect- you 
usually are correct and I value and appreciate you taking the time to review.  
I am saying, " i think that's how this pattern goes in most kits." is neither 
necessary nor sufficient imo, as in some respects I'm explicitly trying to 
avoid the approach of other packages, in this case- refactoring something to be 
more complex with no clear understanding of the benefit. 


> Create a framework for new Mahout Clustering, Classification, and 
> Optimization  Algorithms
> ------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1856
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1856
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.12.1
>            Reporter: Andrew Palumbo
>            Assignee: Trevor Grant
>            Priority: Critical
>             Fix For: 0.13.0
>
>
> To ensure that Mahout does not become "A loose bag of algorithms", Create 
> basic traits with funtions common to each class of algorithm. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAHOUT-1856) Create a framework for new Mahout Clustering, Classification, and Optimization Algorithms

Reply via email to