[ 
https://issues.apache.org/jira/browse/SPARK-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wojciech Jurczyk updated SPARK-12751:
-------------------------------------
    Description: 
Many Estimators and Transformers mix in traits generated by 
[SharedParamsCodeGen|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala].
 These estimators and transformers (like StringIndexer, MinMaxScaler etc) are 
accessible publicly while traits generated by SharedParamsCodeGen are 
private\[ml\]. From user's code it is possible to invoke methods that the 
traits introduce but it is illegal to use any trait explicitly. For example, 
you can call setInputCol(str) on StringIndexer but you are not allowed to 
assign StringIndexer to a variable of type HasInputCol.
{code:java}
val x: HasInputCol = new StringIndexer() // Usage of HasInputCol is illegal.
{code}
For example, it is impossible to create a collection of transformers that have 
both HasInputCol and HasOutputCol (e.g. Set\[Transformer with HasInputCol with 
HasOutputCol\]). We have to use structural typing and reflective calls like 
this:
{code}
ml.Estimator[_] { val outputCol: ml.param.Param[String] }
{code}

This seems easy to fix, exposing a couple of traits should not break anything. 
On the other hand, maybe it goes deeper than that.

  was:
Many Estimators and Transformers mix in traits generated by 
SharedParamsCodeGen. These estimators and transformers (like StringIndexer, 
MinMaxScaler etc) are accessible publicly while traits generated by 
SharedParamsCodeGen are private\[ml\]. From user's code it is possible to 
invoke methods that the traits introduce but it is illegal to use any trait 
explicitly. For example, you can call setInputCol(str) on StringIndexer but you 
are not allowed to assign StringIndexer to a variable of type HasInputCol.
{code:java}
val x: HasInputCol = new StringIndexer() // Usage of HasInputCol is illegal.
{code}
For example, it is impossible to create a collection of transformers that have 
both HasInputCol and HasOutputCol (e.g. Set\[Transformer with HasInputCol with 
HasOutputCol\]). We have to use structural typing and reflective calls like 
this:
{code}
ml.Estimator[_] { val outputCol: ml.param.Param[String] }
{code}

This seems easy to fix, exposing a couple of traits should not break anything. 
On the other hand, maybe it goes deeper than that.


> Traits generated by SharedParamsCodeGen should not be private
> -------------------------------------------------------------
>
>                 Key: SPARK-12751
>                 URL: https://issues.apache.org/jira/browse/SPARK-12751
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.5.2, 1.6.0
>            Reporter: Wojciech Jurczyk
>            Priority: Minor
>
> Many Estimators and Transformers mix in traits generated by 
> [SharedParamsCodeGen|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala].
>  These estimators and transformers (like StringIndexer, MinMaxScaler etc) are 
> accessible publicly while traits generated by SharedParamsCodeGen are 
> private\[ml\]. From user's code it is possible to invoke methods that the 
> traits introduce but it is illegal to use any trait explicitly. For example, 
> you can call setInputCol(str) on StringIndexer but you are not allowed to 
> assign StringIndexer to a variable of type HasInputCol.
> {code:java}
> val x: HasInputCol = new StringIndexer() // Usage of HasInputCol is illegal.
> {code}
> For example, it is impossible to create a collection of transformers that 
> have both HasInputCol and HasOutputCol (e.g. Set\[Transformer with 
> HasInputCol with HasOutputCol\]). We have to use structural typing and 
> reflective calls like this:
> {code}
> ml.Estimator[_] { val outputCol: ml.param.Param[String] }
> {code}
> This seems easy to fix, exposing a couple of traits should not break 
> anything. On the other hand, maybe it goes deeper than that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to