[GitHub] flink pull request: [FLINK-2157] [ml] [WIP] Create evaluation fram...

thvasilo Fri, 26 Jun 2015 05:20:52 -0700

Github user thvasilo commented on a diff in the pull request:

    https://github.com/apache/flink/pull/871#discussion_r33350057
  
    --- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/evaluation/Score.scala
 ---
    @@ -0,0 +1,132 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.ml.evaluation
    +
    +import org.apache.flink.api.common.typeinfo.TypeInformation
    +import org.apache.flink.api.scala._
    +import org.apache.flink.ml._
    +
    +import scala.reflect.ClassTag
    +
    +/**
    + * Evaluation score
    + *
    + * Takes a whole data set and then computes the evaluation score on them 
(obviously, again encoded
    + * in a DataSet)
    + *
    + * @tparam PredictionType output type
    + */
    +trait Score[PredictionType] {
    +  def evaluate(trueAndPredicted: DataSet[(PredictionType, 
PredictionType)]): DataSet[Double]
    +}
    +
    +/** Traits to allow us to determine at runtime if a Score is a loss (lower 
is better) or a
    +  * performance score (higher is better)
    +  */
    +trait Loss
    +
    +trait PerformanceScore
    +
    +/**
    + * Metrics expressible as a mean of a function taking output pairs as input
    + *
    + * @param scoringFct function to apply to all elements
    + * @tparam PredictionType output type
    + */
    +abstract class MeanScore[PredictionType: TypeInformation: ClassTag](
    +    scoringFct: (PredictionType, PredictionType) => Double)
    +    (implicit yyt: TypeInformation[(PredictionType, PredictionType)])
    +  extends Score[PredictionType] with Serializable {
    +  def evaluate(trueAndPredicted: DataSet[(PredictionType, 
PredictionType)]): DataSet[Double] = {
    +    trueAndPredicted.map(yy => scoringFct(yy._1, yy._2)).mean()
    +  }
    +}
    +
    +
    +//TODO: Return to functions in companion object, classes are more 
cumbersome
    +/**
    + * Squared loss function
    + *
    + * returns (y1 - y2)'
    + *
    + * @return a Loss object
    + */
    +class SquaredLoss extends MeanScore[Double]((y1,y2) => (y1 - y2) * (y1 - 
y2)) with Loss
    --- End diff --
    
    Just trying extending Score/MeanScore instead of defining these inside the 
companion object.
    
    Will probably revert this to defining the concrete scores in the companion 
object, but create more classes inheriting from Score, one for each specific 
task (regression, classification).
    
    The concrete scores would then be defined in the companion object of the 
inheriting classes.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2157] [ml] [WIP] Create evaluation fram...

Reply via email to