[jira] [Commented] (SPARK-22351) Support user-created custom Encoders for Datasets

Adamos Loizou (JIRA) Tue, 12 Dec 2017 07:22:44 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-22351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287735#comment-16287735
 ]


Adamos Loizou commented on SPARK-22351:
---------------------------------------

Hello guys, once more I've run against this problem now with ADT/Sealed 
hierarchies examples.
For reference, there are already people facing this issue ([stack overflow 
link|https://stackoverflow.com/questions/41030073/encode-an-adt-sealed-trait-hierarchy-into-spark-dataset-column]).
Here is an example:

{code:java}
sealed trait Fruit
case object Apple extends Fruit
case object Orange extends Fruit

case class Bag(quantity: Int, fruit: Fruit)

Seq(Bag(1, Apple), Bag(3, Orange)).toDS // <- This fails because it can't find 
an encoder for Fruit
{code}

Ideally I'd like to be able to create my encoder where I can tell it, for 
example, to use the case object toString method for mapping it to a String 
column.

How feasible would it be to expose an API for creating custom encoders?
Unfortunately, not having this limits the capacity for generalised and typesafe 
models quite a bit.

Thank you.

> Support user-created custom Encoders for Datasets
> -------------------------------------------------
>
>                 Key: SPARK-22351
>                 URL: https://issues.apache.org/jira/browse/SPARK-22351
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Adamos Loizou
>            Priority: Minor
>
> It would be very helpful if we could easily support creating custom encoders 
> for classes in Spark SQL.
> This is to allow a user to properly define a business model using types of 
> their choice. They can then map them to Spark SQL types without being forced 
> to pollute their model with the built-in mappable types (e.g. 
> {{java.sql.Timestamp}}).
> Specifically in our case, we tend to use either the Java 8 time API or the 
> joda time API for dates instead of {{java.sql.Timestamp}} whose API is quite 
> limited compared to the others.
> Ideally we would like to be able to have a dataset of such a class:
> {code:java}
> case class Person(name: String, dateOfBirth: org.joda.time.LocalDate)
> implicit def localDateTimeEncoder: Encoder[LocalDate] = ??? // we define 
> something that maps to Spark SQL TimestampType
> ...
> // read csv and map it to model
> val people:Dataset[Person] = spark.read.csv("/my/path/file.csv").as[Person]
> {code}
> While this was possible in Spark 1.6 it's not longer the case in Spark 2.x.
> It's also not straight forward as to how to support that using an 
> {{ExpressionEncoder}} (any tips would be much appreciated)
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22351) Support user-created custom Encoders for Datasets

Reply via email to