Awesome, this is a great idea.  I opened SPARK-18122
<https://issues.apache.org/jira/browse/SPARK-18122>.

On Wed, Oct 26, 2016 at 2:11 PM, Koert Kuipers <ko...@tresata.com> wrote:

> if kryo could transparently be used for subtrees without narrowing the
> implicit that would be great
>
> On Wed, Oct 26, 2016 at 5:10 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> i use kryo for the whole thing currently
>>
>> it would be better to use it for the subtree
>>
>> On Wed, Oct 26, 2016 at 5:06 PM, Michael Armbrust <mich...@databricks.com
>> > wrote:
>>
>>> You use kryo encoder for the whole thing?  Or just the subtree that we
>>> don't have specific encoders for?
>>>
>>> Also, I'm saying I like the idea of having a kryo fallback.  I don't see
>>> the point of narrowing the the definition of the implicit.
>>>
>>> On Wed, Oct 26, 2016 at 1:07 PM, Koert Kuipers <ko...@tresata.com>
>>> wrote:
>>>
>>>> for example (the log shows when it creates a kryo encoder):
>>>>
>>>> scala> implicitly[EncoderEvidence[Option[Seq[String]]]].encoder
>>>> res5: org.apache.spark.sql.Encoder[Option[Seq[String]]] =
>>>> class[value[0]: array<string>]
>>>>
>>>> scala> implicitly[EncoderEvidence[Option[Set[String]]]].encoder
>>>> dataframe.EncoderEvidence$: using kryo encoder for
>>>> scala.Option[Set[String]]
>>>> res6: org.apache.spark.sql.Encoder[Option[Set[String]]] =
>>>> class[value[0]: binary]
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Oct 26, 2016 at 4:00 PM, Koert Kuipers <ko...@tresata.com>
>>>> wrote:
>>>>
>>>>> why would generating implicits for ProductN where you also require the
>>>>> elements in the Product to have an expression encoder not work?
>>>>>
>>>>> we do this. and then we have a generic fallback where it produces a
>>>>> kryo encoder.
>>>>>
>>>>> for us the result is that say an implicit for Seq[(Int, Seq[(String,
>>>>> Int)])] will create a new ExpressionEncoder(), while an implicit for
>>>>> Seq[(Int, Set[(String, Int)])] produces a Encoders.kryoEncoder()
>>>>>
>>>>> On Wed, Oct 26, 2016 at 3:50 PM, Michael Armbrust <
>>>>> mich...@databricks.com> wrote:
>>>>>
>>>>>> Sorry, I realize that set is only one example here, but I don't think
>>>>>> that making the type of the implicit more narrow to include only ProductN
>>>>>> or something eliminates the issue.  Even with that change, we will fail 
>>>>>> to
>>>>>> generate an encoder with the same error if you, for example, have a field
>>>>>> of your case class that is an unsupported type.
>>>>>>
>>>>>> Short of changing this to compile-time macros, I think we are stuck
>>>>>> with this class of errors at runtime.  The simplest solution seems to be 
>>>>>> to
>>>>>> expand the set of thing we can handle as much as possible and allow users
>>>>>> to turn on a kryo fallback for expression encoders.  I'd be hesitant to
>>>>>> make this the default though, as behavior would change with each release
>>>>>> that adds support for more types.  I would be very supportive of making
>>>>>> this fallback a built-in option though.
>>>>>>
>>>>>> On Wed, Oct 26, 2016 at 11:47 AM, Koert Kuipers <ko...@tresata.com>
>>>>>> wrote:
>>>>>>
>>>>>>> yup, it doesnt really solve the underlying issue.
>>>>>>>
>>>>>>> we fixed it internally by having our own typeclass that produces
>>>>>>> encoders and that does check the contents of the products, but we did 
>>>>>>> this
>>>>>>> by simply supporting Tuple1 - Tuple22 and Option explicitly, and not
>>>>>>> supporting Product, since we dont have a need for case classes
>>>>>>>
>>>>>>> if case classes extended ProductN (which they will i think in scala
>>>>>>> 2.12?) then we could drop Product and support Product1 - Product22 and
>>>>>>> Option explicitly while checking the classes they contain. that would be
>>>>>>> the cleanest.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 26, 2016 at 2:33 PM, Ryan Blue <rb...@netflix.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Isn't the problem that Option is a Product and the class it
>>>>>>>> contains isn't checked? Adding support for Set fixes the example, but 
>>>>>>>> the
>>>>>>>> problem would happen with any class there isn't an encoder for, right?
>>>>>>>>
>>>>>>>> On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust <
>>>>>>>> mich...@databricks.com> wrote:
>>>>>>>>
>>>>>>>>> Hmm, that is unfortunate.  Maybe the best solution is to add
>>>>>>>>> support for sets?  I don't think that would be super hard.
>>>>>>>>>
>>>>>>>>> On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers <ko...@tresata.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> i am trying to use encoders as a typeclass where if it fails to
>>>>>>>>>> find an ExpressionEncoder it falls back to KryoEncoder.
>>>>>>>>>>
>>>>>>>>>> the issue seems to be that ExpressionEncoder claims a little more
>>>>>>>>>> than it can handle here:
>>>>>>>>>>   implicit def newProductEncoder[T <: Product : TypeTag]:
>>>>>>>>>> Encoder[T] = Encoders.product[T]
>>>>>>>>>>
>>>>>>>>>> this "claims" to handle for example Option[Set[Int]], but it
>>>>>>>>>> really cannot handle Set so it leads to a runtime exception.
>>>>>>>>>>
>>>>>>>>>> would it be useful to make this a little more specific? i guess
>>>>>>>>>> the challenge is going to be case classes which unfortunately dont 
>>>>>>>>>> extend
>>>>>>>>>> Product1, Product2, etc.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ryan Blue
>>>>>>>> Software Engineer
>>>>>>>> Netflix
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to