@Alexander It's worked for us to use Param[String] directly. (I think it's b/c String is exactly java.lang.String, rather than a Scala version of it, so it's still Java-friendly.) In other classes, I've added a static list (e.g., NaiveBayes.supportedModelTypes), though there isn't consistent coverage on that yet.
@Stephen It could be used, but I prefer String for spark.ml since it's easier to maintain consistent APIs across languages. That's what we've used so far, at least. On Wed, Sep 16, 2015 at 6:00 PM, Stephen Boesch <java...@gmail.com> wrote: > There was a long thread about enum's initiated by Xiangrui several months > back in which the final consensus was to use java enum's. Is that > discussion (/decision) applicable here? > > 2015-09-16 17:43 GMT-07:00 Ulanov, Alexander <alexander.ula...@hpe.com>: > >> Hi Joseph, >> >> >> >> Strings sounds reasonable. However, there is no StringParam (only >> StringArrayParam). Should I create a new param type? Also, how can the user >> get all possible values of String parameter? >> >> >> >> Best regards, Alexander >> >> >> >> *From:* Joseph Bradley [mailto:jos...@databricks.com] >> *Sent:* Wednesday, September 16, 2015 5:35 PM >> *To:* Feynman Liang >> *Cc:* Ulanov, Alexander; dev@spark.apache.org >> *Subject:* Re: Enum parameter in ML >> >> >> >> I've tended to use Strings. Params can be created with a validator >> (isValid) which can ensure users get an immediate error if they try to pass >> an unsupported String. Not as nice as compile-time errors, but easier on >> the APIs. >> >> >> >> On Mon, Sep 14, 2015 at 6:07 PM, Feynman Liang <fli...@databricks.com> >> wrote: >> >> We usually write a Java test suite which exercises the public API (e.g. >> DCT >> <https://github.com/apache/spark/blob/master/mllib/src/test/java/org/apache/spark/ml/feature/JavaDCTSuite.java#L71> >> ). >> >> >> >> It may be possible to create a sealed trait with singleton concrete >> instances inside of a serializable companion object, the just introduce a >> Param[SealedTrait] to the model (e.g. StreamingDecay PR >> <https://github.com/apache/spark/pull/8022/files#diff-cea0bec4853b1b2748ec006682218894R99>). >> However, this would require Java users to use >> CompanionObject$.ConcreteInstanceName to access enum values which isn't the >> prettiest syntax. >> >> >> >> Another option would just be to use Strings, which although is not type >> safe does simplify implementation. >> >> >> >> On Mon, Sep 14, 2015 at 5:43 PM, Ulanov, Alexander < >> alexander.ula...@hpe.com> wrote: >> >> Hi Feynman, >> >> >> >> Thank you for suggestion. How can I ensure that there will be no problems >> for Java users? (I only use Scala API) >> >> >> >> Best regards, Alexander >> >> >> >> *From:* Feynman Liang [mailto:fli...@databricks.com] >> *Sent:* Monday, September 14, 2015 5:27 PM >> *To:* Ulanov, Alexander >> *Cc:* dev@spark.apache.org >> *Subject:* Re: Enum parameter in ML >> >> >> >> Since PipelineStages are serializable, the params must also be >> serializable. We also have to keep the Java API in mind. Introducing a new >> enum Param type may work, but we will have to ensure that Java users can >> use it without dealing with ClassTags (I believe Scala will create new >> types for each possible value in the Enum) and that it can be serialized. >> >> >> >> On Mon, Sep 14, 2015 at 4:31 PM, Ulanov, Alexander < >> alexander.ula...@hpe.com> wrote: >> >> Dear Spark developers, >> >> >> >> I am currently implementing the Estimator in ML that has a parameter that >> can take several different values that are mutually exclusive. The most >> appropriate type seems to be Scala Enum ( >> http://www.scala-lang.org/api/current/index.html#scala.Enumeration). >> However, the current ML API has the following parameter types: >> >> BooleanParam, DoubleArrayParam, DoubleParam, FloatParam, IntArrayParam, >> IntParam, LongParam, StringArrayParam >> >> >> >> Should I introduce a new parameter type in ML API that is based on Scala >> Enum? >> >> >> >> Best regards, Alexander >> >> >> >> >> >> >> > >