In MLlib, we use strings for emu-like types in Python APIs, which is quite common in Python and easy for py4j. On the JVM side, we implement `fromString` to convert them back to enums. -Xiangrui
On Wed, Mar 11, 2015 at 12:56 PM, RJ Nowling <rnowl...@gmail.com> wrote: > How do these proposals affect PySpark? I think compatibility with PySpark > through Py4J should be considered. > > On Mon, Mar 9, 2015 at 8:39 PM, Patrick Wendell <pwend...@gmail.com> wrote: > >> Does this matter for our own internal types in Spark? I don't think >> any of these types are designed to be used in RDD records, for >> instance. >> >> On Mon, Mar 9, 2015 at 6:25 PM, Aaron Davidson <ilike...@gmail.com> wrote: >> > Perhaps the problem with Java enums that was brought up was actually that >> > their hashCode is not stable across JVMs, as it depends on the memory >> > location of the enum itself. >> > >> > On Mon, Mar 9, 2015 at 6:15 PM, Imran Rashid <iras...@cloudera.com> >> wrote: >> > >> >> Can you expand on the serde issues w/ java enum's at all? I haven't >> heard >> >> of any problems specific to enums. The java object serialization rules >> >> seem very clear and it doesn't seem like different jvms should have a >> >> choice on what they do: >> >> >> >> >> >> >> http://docs.oracle.com/javase/6/docs/platform/serialization/spec/serial-arch.html#6469 >> >> >> >> (in a nutshell, serialization must use enum.name()) >> >> >> >> of course there are plenty of ways the user could screw this up(eg. >> rename >> >> the enums, or change their meaning, or remove them). But then again, >> all >> >> of java serialization has issues w/ serialization the user has to be >> aware >> >> of. Eg., if we go with case objects, than java serialization blows up >> if >> >> you add another helper method, even if that helper method is completely >> >> compatible. >> >> >> >> Some prior debate in the scala community: >> >> >> >> >> https://groups.google.com/d/msg/scala-internals/8RWkccSRBxQ/AN5F_ZbdKIsJ >> >> >> >> SO post on which version to use in scala: >> >> >> >> >> >> >> http://stackoverflow.com/questions/1321745/how-to-model-type-safe-enum-types >> >> >> >> SO post about the macro-craziness people try to add to scala to make >> them >> >> almost as good as a simple java enum: >> >> (NB: the accepted answer doesn't actually work in all cases ...) >> >> >> >> >> >> >> http://stackoverflow.com/questions/20089920/custom-scala-enum-most-elegant-version-searched >> >> >> >> Another proposal to add better enums built into scala ... but seems to >> be >> >> dormant: >> >> >> >> https://groups.google.com/forum/#!topic/scala-sips/Bf82LxK02Kk >> >> >> >> >> >> >> >> On Thu, Mar 5, 2015 at 10:49 PM, Mridul Muralidharan <mri...@gmail.com> >> >> wrote: >> >> >> >> > I have a strong dislike for java enum's due to the fact that they >> >> > are not stable across JVM's - if it undergoes serde, you end up with >> >> > unpredictable results at times [1]. >> >> > One of the reasons why we prevent enum's from being key : though it is >> >> > highly possible users might depend on it internally and shoot >> >> > themselves in the foot. >> >> > >> >> > Would be better to keep away from them in general and use something >> more >> >> > stable. >> >> > >> >> > Regards, >> >> > Mridul >> >> > >> >> > [1] Having had to debug this issue for 2 weeks - I really really hate >> it. >> >> > >> >> > >> >> > On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <iras...@cloudera.com> >> >> wrote: >> >> > > I have a very strong dislike for #1 (scala enumerations). I'm ok >> with >> >> > #4 >> >> > > (with Xiangrui's final suggestion, especially making it sealed & >> >> > available >> >> > > in Java), but I really think #2, java enums, are the best option. >> >> > > >> >> > > Java enums actually have some very real advantages over the other >> >> > > approaches -- you get values(), valueOf(), EnumSet, and EnumMap. >> There >> >> > has >> >> > > been endless debate in the Scala community about the problems with >> the >> >> > > approaches in Scala. Very smart, level-headed Scala gurus have >> >> > complained >> >> > > about their short-comings (Rex Kerr's name is coming to mind, though >> >> I'm >> >> > > not positive about that); there have been numerous well-thought out >> >> > > proposals to give Scala a better enum. But the powers-that-be in >> Scala >> >> > > always reject them. IIRC the explanation for rejecting is basically >> >> that >> >> > > (a) enums aren't important enough for introducing some new special >> >> > feature, >> >> > > scala's got bigger things to work on and (b) if you really need a >> good >> >> > > enum, just use java's enum. >> >> > > >> >> > > I doubt it really matters that much for Spark internals, which is >> why I >> >> > > think #4 is fine. But I figured I'd give my spiel, because every >> >> > developer >> >> > > loves language wars :) >> >> > > >> >> > > Imran >> >> > > >> >> > > >> >> > > >> >> > > On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <men...@gmail.com> >> >> wrote: >> >> > > >> >> > >> `case object` inside an `object` doesn't show up in Java. This is >> the >> >> > >> minimal code I found to make everything show up correctly in both >> >> > >> Scala and Java: >> >> > >> >> >> > >> sealed abstract class StorageLevel // cannot be a trait >> >> > >> >> >> > >> object StorageLevel { >> >> > >> private[this] case object _MemoryOnly extends StorageLevel >> >> > >> final val MemoryOnly: StorageLevel = _MemoryOnly >> >> > >> >> >> > >> private[this] case object _DiskOnly extends StorageLevel >> >> > >> final val DiskOnly: StorageLevel = _DiskOnly >> >> > >> } >> >> > >> >> >> > >> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell < >> pwend...@gmail.com> >> >> > >> wrote: >> >> > >> > I like #4 as well and agree with Aaron's suggestion. >> >> > >> > >> >> > >> > - Patrick >> >> > >> > >> >> > >> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson < >> ilike...@gmail.com> >> >> > >> wrote: >> >> > >> >> I'm cool with #4 as well, but make sure we dictate that the >> values >> >> > >> should >> >> > >> >> be defined within an object with the same name as the >> enumeration >> >> > (like >> >> > >> we >> >> > >> >> do for StorageLevel). Otherwise we may pollute a higher >> namespace. >> >> > >> >> >> >> > >> >> e.g. we SHOULD do: >> >> > >> >> >> >> > >> >> trait StorageLevel >> >> > >> >> object StorageLevel { >> >> > >> >> case object MemoryOnly extends StorageLevel >> >> > >> >> case object DiskOnly extends StorageLevel >> >> > >> >> } >> >> > >> >> >> >> > >> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust < >> >> > >> mich...@databricks.com> >> >> > >> >> wrote: >> >> > >> >> >> >> > >> >>> #4 with a preference for CamelCaseEnums >> >> > >> >>> >> >> > >> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley < >> >> > jos...@databricks.com> >> >> > >> >>> wrote: >> >> > >> >>> >> >> > >> >>> > another vote for #4 >> >> > >> >>> > People are already used to adding "()" in Java. >> >> > >> >>> > >> >> > >> >>> > >> >> > >> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch < >> >> java...@gmail.com >> >> > > >> >> > >> >>> wrote: >> >> > >> >>> > >> >> > >> >>> > > #4 but with MemoryOnly (more scala-like) >> >> > >> >>> > > >> >> > >> >>> > > http://docs.scala-lang.org/style/naming-conventions.html >> >> > >> >>> > > >> >> > >> >>> > > Constants, Values, Variable and Methods >> >> > >> >>> > > >> >> > >> >>> > > Constant names should be in upper camel case. That is, if >> the >> >> > >> member is >> >> > >> >>> > > final, immutable and it belongs to a package object or an >> >> > object, >> >> > >> it >> >> > >> >>> may >> >> > >> >>> > be >> >> > >> >>> > > considered a constant (similar to Java'sstatic final >> members): >> >> > >> >>> > > >> >> > >> >>> > > >> >> > >> >>> > > 1. object Container { >> >> > >> >>> > > 2. val MyConstant = ... >> >> > >> >>> > > 3. } >> >> > >> >>> > > >> >> > >> >>> > > >> >> > >> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <men...@gmail.com >> >: >> >> > >> >>> > > >> >> > >> >>> > > > Hi all, >> >> > >> >>> > > > >> >> > >> >>> > > > There are many places where we use enum-like types in >> Spark, >> >> > but >> >> > >> in >> >> > >> >>> > > > different ways. Every approach has both pros and cons. I >> >> > wonder >> >> > >> >>> > > > whether there should be an "official" approach for >> enum-like >> >> > >> types in >> >> > >> >>> > > > Spark. >> >> > >> >>> > > > >> >> > >> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode, >> WorkerState, >> >> > etc) >> >> > >> >>> > > > >> >> > >> >>> > > > * All types show up as Enumeration.Value in Java. >> >> > >> >>> > > > >> >> > >> >>> > > > >> >> > >> >>> > > >> >> > >> >>> > >> >> > >> >>> >> >> > >> >> >> > >> >> >> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html >> >> > >> >>> > > > >> >> > >> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode) >> >> > >> >>> > > > >> >> > >> >>> > > > * Implementation must be in a Java file. >> >> > >> >>> > > > * Values doesn't show up in the ScalaDoc: >> >> > >> >>> > > > >> >> > >> >>> > > > >> >> > >> >>> > > >> >> > >> >>> > >> >> > >> >>> >> >> > >> >> >> > >> >> >> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode >> >> > >> >>> > > > >> >> > >> >>> > > > 3. Static fields in Java (e.g., TripletFields) >> >> > >> >>> > > > >> >> > >> >>> > > > * Implementation must be in a Java file. >> >> > >> >>> > > > * Doesn't need "()" in Java code. >> >> > >> >>> > > > * Values don't show up in the ScalaDoc: >> >> > >> >>> > > > >> >> > >> >>> > > > >> >> > >> >>> > > >> >> > >> >>> > >> >> > >> >>> >> >> > >> >> >> > >> >> >> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields >> >> > >> >>> > > > >> >> > >> >>> > > > 4. Objects in Scala. (e.g., StorageLevel) >> >> > >> >>> > > > >> >> > >> >>> > > > * Needs "()" in Java code. >> >> > >> >>> > > > * Values show up in both ScalaDoc and JavaDoc: >> >> > >> >>> > > > >> >> > >> >>> > > > >> >> > >> >>> > > >> >> > >> >>> > >> >> > >> >>> >> >> > >> >> >> > >> >> >> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$ >> >> > >> >>> > > > >> >> > >> >>> > > > >> >> > >> >>> > > >> >> > >> >>> > >> >> > >> >>> >> >> > >> >> >> > >> >> >> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html >> >> > >> >>> > > > >> >> > >> >>> > > > It would be great if we have an "official" approach for >> this >> >> > as >> >> > >> well >> >> > >> >>> > > > as the naming convention for enum-like values >> ("MEMORY_ONLY" >> >> > or >> >> > >> >>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". >> Any >> >> > >> >>> thoughts? >> >> > >> >>> > > > >> >> > >> >>> > > > Best, >> >> > >> >>> > > > Xiangrui >> >> > >> >>> > > > >> >> > >> >>> > > > >> >> > >> >> --------------------------------------------------------------------- >> >> > >> >>> > > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> >> > >> >>> > > > For additional commands, e-mail: >> dev-h...@spark.apache.org >> >> > >> >>> > > > >> >> > >> >>> > > > >> >> > >> >>> > > >> >> > >> >>> > >> >> > >> >>> >> >> > >> >> >> > >> >> --------------------------------------------------------------------- >> >> > >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> >> > >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> > >> >> >> > >> >> >> > >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org