Awesome, this is a great idea. I opened SPARK-18122 <https://issues.apache.org/jira/browse/SPARK-18122>.
On Wed, Oct 26, 2016 at 2:11 PM, Koert Kuipers <ko...@tresata.com> wrote: > if kryo could transparently be used for subtrees without narrowing the > implicit that would be great > > On Wed, Oct 26, 2016 at 5:10 PM, Koert Kuipers <ko...@tresata.com> wrote: > >> i use kryo for the whole thing currently >> >> it would be better to use it for the subtree >> >> On Wed, Oct 26, 2016 at 5:06 PM, Michael Armbrust <mich...@databricks.com >> > wrote: >> >>> You use kryo encoder for the whole thing? Or just the subtree that we >>> don't have specific encoders for? >>> >>> Also, I'm saying I like the idea of having a kryo fallback. I don't see >>> the point of narrowing the the definition of the implicit. >>> >>> On Wed, Oct 26, 2016 at 1:07 PM, Koert Kuipers <ko...@tresata.com> >>> wrote: >>> >>>> for example (the log shows when it creates a kryo encoder): >>>> >>>> scala> implicitly[EncoderEvidence[Option[Seq[String]]]].encoder >>>> res5: org.apache.spark.sql.Encoder[Option[Seq[String]]] = >>>> class[value[0]: array<string>] >>>> >>>> scala> implicitly[EncoderEvidence[Option[Set[String]]]].encoder >>>> dataframe.EncoderEvidence$: using kryo encoder for >>>> scala.Option[Set[String]] >>>> res6: org.apache.spark.sql.Encoder[Option[Set[String]]] = >>>> class[value[0]: binary] >>>> >>>> >>>> >>>> >>>> On Wed, Oct 26, 2016 at 4:00 PM, Koert Kuipers <ko...@tresata.com> >>>> wrote: >>>> >>>>> why would generating implicits for ProductN where you also require the >>>>> elements in the Product to have an expression encoder not work? >>>>> >>>>> we do this. and then we have a generic fallback where it produces a >>>>> kryo encoder. >>>>> >>>>> for us the result is that say an implicit for Seq[(Int, Seq[(String, >>>>> Int)])] will create a new ExpressionEncoder(), while an implicit for >>>>> Seq[(Int, Set[(String, Int)])] produces a Encoders.kryoEncoder() >>>>> >>>>> On Wed, Oct 26, 2016 at 3:50 PM, Michael Armbrust < >>>>> mich...@databricks.com> wrote: >>>>> >>>>>> Sorry, I realize that set is only one example here, but I don't think >>>>>> that making the type of the implicit more narrow to include only ProductN >>>>>> or something eliminates the issue. Even with that change, we will fail >>>>>> to >>>>>> generate an encoder with the same error if you, for example, have a field >>>>>> of your case class that is an unsupported type. >>>>>> >>>>>> Short of changing this to compile-time macros, I think we are stuck >>>>>> with this class of errors at runtime. The simplest solution seems to be >>>>>> to >>>>>> expand the set of thing we can handle as much as possible and allow users >>>>>> to turn on a kryo fallback for expression encoders. I'd be hesitant to >>>>>> make this the default though, as behavior would change with each release >>>>>> that adds support for more types. I would be very supportive of making >>>>>> this fallback a built-in option though. >>>>>> >>>>>> On Wed, Oct 26, 2016 at 11:47 AM, Koert Kuipers <ko...@tresata.com> >>>>>> wrote: >>>>>> >>>>>>> yup, it doesnt really solve the underlying issue. >>>>>>> >>>>>>> we fixed it internally by having our own typeclass that produces >>>>>>> encoders and that does check the contents of the products, but we did >>>>>>> this >>>>>>> by simply supporting Tuple1 - Tuple22 and Option explicitly, and not >>>>>>> supporting Product, since we dont have a need for case classes >>>>>>> >>>>>>> if case classes extended ProductN (which they will i think in scala >>>>>>> 2.12?) then we could drop Product and support Product1 - Product22 and >>>>>>> Option explicitly while checking the classes they contain. that would be >>>>>>> the cleanest. >>>>>>> >>>>>>> >>>>>>> On Wed, Oct 26, 2016 at 2:33 PM, Ryan Blue <rb...@netflix.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Isn't the problem that Option is a Product and the class it >>>>>>>> contains isn't checked? Adding support for Set fixes the example, but >>>>>>>> the >>>>>>>> problem would happen with any class there isn't an encoder for, right? >>>>>>>> >>>>>>>> On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust < >>>>>>>> mich...@databricks.com> wrote: >>>>>>>> >>>>>>>>> Hmm, that is unfortunate. Maybe the best solution is to add >>>>>>>>> support for sets? I don't think that would be super hard. >>>>>>>>> >>>>>>>>> On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers <ko...@tresata.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> i am trying to use encoders as a typeclass where if it fails to >>>>>>>>>> find an ExpressionEncoder it falls back to KryoEncoder. >>>>>>>>>> >>>>>>>>>> the issue seems to be that ExpressionEncoder claims a little more >>>>>>>>>> than it can handle here: >>>>>>>>>> implicit def newProductEncoder[T <: Product : TypeTag]: >>>>>>>>>> Encoder[T] = Encoders.product[T] >>>>>>>>>> >>>>>>>>>> this "claims" to handle for example Option[Set[Int]], but it >>>>>>>>>> really cannot handle Set so it leads to a runtime exception. >>>>>>>>>> >>>>>>>>>> would it be useful to make this a little more specific? i guess >>>>>>>>>> the challenge is going to be case classes which unfortunately dont >>>>>>>>>> extend >>>>>>>>>> Product1, Product2, etc. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ryan Blue >>>>>>>> Software Engineer >>>>>>>> Netflix >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >