Wojciech Jurczyk created SPARK-12751:
----------------------------------------
Summary: Traits generated by SharedParamsCodeGen should not be
private
Key: SPARK-12751
URL: https://issues.apache.org/jira/browse/SPARK-12751
Project: Spark
Issue Type: Improvement
Components: MLlib
Affects Versions: 1.6.0, 1.5.2
Reporter: Wojciech Jurczyk
Many Estimators and Transformers mix in traits generated by
SharedParamsCodeGen. These estimators and transformers (like StringIndexer,
MinMaxScaler etc) are accessible publicly while traits generated by
SharedParamsCodeGen are private\[ml\]. From user's code it is possible to
invoke methods that the traits introduce but it is illegal to use any trait
explicitly. For example, you can call setInputCol(str) on StringIndexer but you
are not allowed to assign StringIndexer to a variable of type HasInputCol.
{code:java}
val x: HasInputCol = new StringIndexer() // Usage of HasInputCol is illegal.
{code}
For example, it is impossible to create a collection of transformers that have
both HasInputCol and HasOutputCol (e.g. Set\[Transformer with HasInputCol with
HasOutputCol\]). We have to use structural typing and reflective calls like
this:
{code}
ml.Estimator[_] { val outputCol: ml.param.Param[String] }
{code}
This seems easy to fix, exposing a couple of traits should not break anything.
On the other hand, maybe it goes deeper than that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]