How do these proposals affect PySpark? I think compatibility with PySpark through Py4J should be considered.
On Mon, Mar 9, 2015 at 8:39 PM, Patrick Wendell <pwend...@gmail.com> wrote: > Does this matter for our own internal types in Spark? I don't think > any of these types are designed to be used in RDD records, for > instance. > > On Mon, Mar 9, 2015 at 6:25 PM, Aaron Davidson <ilike...@gmail.com> wrote: > > Perhaps the problem with Java enums that was brought up was actually that > > their hashCode is not stable across JVMs, as it depends on the memory > > location of the enum itself. > > > > On Mon, Mar 9, 2015 at 6:15 PM, Imran Rashid <iras...@cloudera.com> > wrote: > > > >> Can you expand on the serde issues w/ java enum's at all? I haven't > heard > >> of any problems specific to enums. The java object serialization rules > >> seem very clear and it doesn't seem like different jvms should have a > >> choice on what they do: > >> > >> > >> > http://docs.oracle.com/javase/6/docs/platform/serialization/spec/serial-arch.html#6469 > >> > >> (in a nutshell, serialization must use enum.name()) > >> > >> of course there are plenty of ways the user could screw this up(eg. > rename > >> the enums, or change their meaning, or remove them). But then again, > all > >> of java serialization has issues w/ serialization the user has to be > aware > >> of. Eg., if we go with case objects, than java serialization blows up > if > >> you add another helper method, even if that helper method is completely > >> compatible. > >> > >> Some prior debate in the scala community: > >> > >> > https://groups.google.com/d/msg/scala-internals/8RWkccSRBxQ/AN5F_ZbdKIsJ > >> > >> SO post on which version to use in scala: > >> > >> > >> > http://stackoverflow.com/questions/1321745/how-to-model-type-safe-enum-types > >> > >> SO post about the macro-craziness people try to add to scala to make > them > >> almost as good as a simple java enum: > >> (NB: the accepted answer doesn't actually work in all cases ...) > >> > >> > >> > http://stackoverflow.com/questions/20089920/custom-scala-enum-most-elegant-version-searched > >> > >> Another proposal to add better enums built into scala ... but seems to > be > >> dormant: > >> > >> https://groups.google.com/forum/#!topic/scala-sips/Bf82LxK02Kk > >> > >> > >> > >> On Thu, Mar 5, 2015 at 10:49 PM, Mridul Muralidharan <mri...@gmail.com> > >> wrote: > >> > >> > I have a strong dislike for java enum's due to the fact that they > >> > are not stable across JVM's - if it undergoes serde, you end up with > >> > unpredictable results at times [1]. > >> > One of the reasons why we prevent enum's from being key : though it is > >> > highly possible users might depend on it internally and shoot > >> > themselves in the foot. > >> > > >> > Would be better to keep away from them in general and use something > more > >> > stable. > >> > > >> > Regards, > >> > Mridul > >> > > >> > [1] Having had to debug this issue for 2 weeks - I really really hate > it. > >> > > >> > > >> > On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <iras...@cloudera.com> > >> wrote: > >> > > I have a very strong dislike for #1 (scala enumerations). I'm ok > with > >> > #4 > >> > > (with Xiangrui's final suggestion, especially making it sealed & > >> > available > >> > > in Java), but I really think #2, java enums, are the best option. > >> > > > >> > > Java enums actually have some very real advantages over the other > >> > > approaches -- you get values(), valueOf(), EnumSet, and EnumMap. > There > >> > has > >> > > been endless debate in the Scala community about the problems with > the > >> > > approaches in Scala. Very smart, level-headed Scala gurus have > >> > complained > >> > > about their short-comings (Rex Kerr's name is coming to mind, though > >> I'm > >> > > not positive about that); there have been numerous well-thought out > >> > > proposals to give Scala a better enum. But the powers-that-be in > Scala > >> > > always reject them. IIRC the explanation for rejecting is basically > >> that > >> > > (a) enums aren't important enough for introducing some new special > >> > feature, > >> > > scala's got bigger things to work on and (b) if you really need a > good > >> > > enum, just use java's enum. > >> > > > >> > > I doubt it really matters that much for Spark internals, which is > why I > >> > > think #4 is fine. But I figured I'd give my spiel, because every > >> > developer > >> > > loves language wars :) > >> > > > >> > > Imran > >> > > > >> > > > >> > > > >> > > On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <men...@gmail.com> > >> wrote: > >> > > > >> > >> `case object` inside an `object` doesn't show up in Java. This is > the > >> > >> minimal code I found to make everything show up correctly in both > >> > >> Scala and Java: > >> > >> > >> > >> sealed abstract class StorageLevel // cannot be a trait > >> > >> > >> > >> object StorageLevel { > >> > >> private[this] case object _MemoryOnly extends StorageLevel > >> > >> final val MemoryOnly: StorageLevel = _MemoryOnly > >> > >> > >> > >> private[this] case object _DiskOnly extends StorageLevel > >> > >> final val DiskOnly: StorageLevel = _DiskOnly > >> > >> } > >> > >> > >> > >> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell < > pwend...@gmail.com> > >> > >> wrote: > >> > >> > I like #4 as well and agree with Aaron's suggestion. > >> > >> > > >> > >> > - Patrick > >> > >> > > >> > >> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson < > ilike...@gmail.com> > >> > >> wrote: > >> > >> >> I'm cool with #4 as well, but make sure we dictate that the > values > >> > >> should > >> > >> >> be defined within an object with the same name as the > enumeration > >> > (like > >> > >> we > >> > >> >> do for StorageLevel). Otherwise we may pollute a higher > namespace. > >> > >> >> > >> > >> >> e.g. we SHOULD do: > >> > >> >> > >> > >> >> trait StorageLevel > >> > >> >> object StorageLevel { > >> > >> >> case object MemoryOnly extends StorageLevel > >> > >> >> case object DiskOnly extends StorageLevel > >> > >> >> } > >> > >> >> > >> > >> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust < > >> > >> mich...@databricks.com> > >> > >> >> wrote: > >> > >> >> > >> > >> >>> #4 with a preference for CamelCaseEnums > >> > >> >>> > >> > >> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley < > >> > jos...@databricks.com> > >> > >> >>> wrote: > >> > >> >>> > >> > >> >>> > another vote for #4 > >> > >> >>> > People are already used to adding "()" in Java. > >> > >> >>> > > >> > >> >>> > > >> > >> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch < > >> java...@gmail.com > >> > > > >> > >> >>> wrote: > >> > >> >>> > > >> > >> >>> > > #4 but with MemoryOnly (more scala-like) > >> > >> >>> > > > >> > >> >>> > > http://docs.scala-lang.org/style/naming-conventions.html > >> > >> >>> > > > >> > >> >>> > > Constants, Values, Variable and Methods > >> > >> >>> > > > >> > >> >>> > > Constant names should be in upper camel case. That is, if > the > >> > >> member is > >> > >> >>> > > final, immutable and it belongs to a package object or an > >> > object, > >> > >> it > >> > >> >>> may > >> > >> >>> > be > >> > >> >>> > > considered a constant (similar to Java'sstatic final > members): > >> > >> >>> > > > >> > >> >>> > > > >> > >> >>> > > 1. object Container { > >> > >> >>> > > 2. val MyConstant = ... > >> > >> >>> > > 3. } > >> > >> >>> > > > >> > >> >>> > > > >> > >> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <men...@gmail.com > >: > >> > >> >>> > > > >> > >> >>> > > > Hi all, > >> > >> >>> > > > > >> > >> >>> > > > There are many places where we use enum-like types in > Spark, > >> > but > >> > >> in > >> > >> >>> > > > different ways. Every approach has both pros and cons. I > >> > wonder > >> > >> >>> > > > whether there should be an "official" approach for > enum-like > >> > >> types in > >> > >> >>> > > > Spark. > >> > >> >>> > > > > >> > >> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode, > WorkerState, > >> > etc) > >> > >> >>> > > > > >> > >> >>> > > > * All types show up as Enumeration.Value in Java. > >> > >> >>> > > > > >> > >> >>> > > > > >> > >> >>> > > > >> > >> >>> > > >> > >> >>> > >> > >> > >> > > >> > http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html > >> > >> >>> > > > > >> > >> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode) > >> > >> >>> > > > > >> > >> >>> > > > * Implementation must be in a Java file. > >> > >> >>> > > > * Values doesn't show up in the ScalaDoc: > >> > >> >>> > > > > >> > >> >>> > > > > >> > >> >>> > > > >> > >> >>> > > >> > >> >>> > >> > >> > >> > > >> > http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode > >> > >> >>> > > > > >> > >> >>> > > > 3. Static fields in Java (e.g., TripletFields) > >> > >> >>> > > > > >> > >> >>> > > > * Implementation must be in a Java file. > >> > >> >>> > > > * Doesn't need "()" in Java code. > >> > >> >>> > > > * Values don't show up in the ScalaDoc: > >> > >> >>> > > > > >> > >> >>> > > > > >> > >> >>> > > > >> > >> >>> > > >> > >> >>> > >> > >> > >> > > >> > http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields > >> > >> >>> > > > > >> > >> >>> > > > 4. Objects in Scala. (e.g., StorageLevel) > >> > >> >>> > > > > >> > >> >>> > > > * Needs "()" in Java code. > >> > >> >>> > > > * Values show up in both ScalaDoc and JavaDoc: > >> > >> >>> > > > > >> > >> >>> > > > > >> > >> >>> > > > >> > >> >>> > > >> > >> >>> > >> > >> > >> > > >> > http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$ > >> > >> >>> > > > > >> > >> >>> > > > > >> > >> >>> > > > >> > >> >>> > > >> > >> >>> > >> > >> > >> > > >> > http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html > >> > >> >>> > > > > >> > >> >>> > > > It would be great if we have an "official" approach for > this > >> > as > >> > >> well > >> > >> >>> > > > as the naming convention for enum-like values > ("MEMORY_ONLY" > >> > or > >> > >> >>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". > Any > >> > >> >>> thoughts? > >> > >> >>> > > > > >> > >> >>> > > > Best, > >> > >> >>> > > > Xiangrui > >> > >> >>> > > > > >> > >> >>> > > > > >> > >> > --------------------------------------------------------------------- > >> > >> >>> > > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > >> > >> >>> > > > For additional commands, e-mail: > dev-h...@spark.apache.org > >> > >> >>> > > > > >> > >> >>> > > > > >> > >> >>> > > > >> > >> >>> > > >> > >> >>> > >> > >> > >> > >> > --------------------------------------------------------------------- > >> > >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > >> > >> For additional commands, e-mail: dev-h...@spark.apache.org > >> > >> > >> > >> > >> > > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >