[GitHub] flink pull request: [FLINK-2050] Introduces new pipelining mechani...

thvasilo Thu, 21 May 2015 01:44:24 -0700

Github user thvasilo commented on a diff in the pull request:

    https://github.com/apache/flink/pull/704#discussion_r30783065
  
    --- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/StandardScaler.scala
 ---
    @@ -22,38 +22,47 @@ import breeze.linalg
     import breeze.numerics.sqrt
     import breeze.numerics.sqrt._
     import org.apache.flink.api.common.functions._
    +import org.apache.flink.api.common.typeinfo.TypeInformation
     import org.apache.flink.api.scala._
     import org.apache.flink.configuration.Configuration
    -import org.apache.flink.ml.common.{Parameter, ParameterMap, Transformer}
    +import org.apache.flink.ml.common.{LabeledVector, Parameter, ParameterMap}
     import org.apache.flink.ml.math.Breeze._
    -import org.apache.flink.ml.math.Vector
    +import org.apache.flink.ml.math.{BreezeVectorConverter, Vector}
    +import org.apache.flink.ml.pipeline.{TransformOperation, FitOperation, 
Transformer}
     import org.apache.flink.ml.preprocessing.StandardScaler.{Mean, Std}
     
    +import scala.reflect.ClassTag
    +
     /** Scales observations, so that all features have a user-specified mean 
and standard deviation.
       * By default for [[StandardScaler]] transformer mean=0.0 and std=1.0.
       *
    -  * This transformer takes a [[Vector]] of values and maps it to a
    -  * scaled [[Vector]] such that each feature has a user-specified mean and 
standard deviation.
    +  * This transformer takes a subtype of  [[Vector]] of values and maps it 
to a
    +  * scaled subtype of [[Vector]] such that each feature has a 
user-specified mean and standard
    +  * deviation.
       *
       * This transformer can be prepended to all [[Transformer]] and
    -  * [[org.apache.flink.ml.common.Learner]] implementations which expect an 
input of
    -  * [[Vector]].
    +  * [[org.apache.flink.ml.pipeline.Predictor]] implementations which 
expect as input a subtype
    +  * of [[Vector]].
       *
       * @example
       *          {{{
       *            val trainingDS: DataSet[Vector] = env.fromCollection(data)
       *            val transformer = StandardScaler().setMean(10.0).setStd(2.0)
       *
    -  *            transformer.transform(trainingDS)
    +  *            transformer.fit(trainingDS)
    +  *            val transformedDS = transformer.transform(trainingDS)
       *          }}}
       *
       * =Parameters=
       *
    -  * - [[StandardScaler.Mean]]: The mean value of transformed data set; by 
default equal to 0
    -  * - [[StandardScaler.Std]]: The standard deviation of the transformed 
data set; by default
    +  * - [[Mean]]: The mean value of transformed data set; by default equal 
to 0
    +  * - [[Std]]: The standard deviation of the transformed data set; by 
default
       * equal to 1
    --- End diff --
    
    Why use just the top-level type here, but the fully qualified one in the 
ALS docstring?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2050] Introduces new pipelining mechani...

Reply via email to