[
https://issues.apache.org/jira/browse/SPARK-22351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287735#comment-16287735
]
Adamos Loizou commented on SPARK-22351:
---------------------------------------
Hello guys, once more I've run against this problem now with ADT/Sealed
hierarchies examples.
For reference, there are already people facing this issue ([stack overflow
link|https://stackoverflow.com/questions/41030073/encode-an-adt-sealed-trait-hierarchy-into-spark-dataset-column]).
Here is an example:
{code:java}
sealed trait Fruit
case object Apple extends Fruit
case object Orange extends Fruit
case class Bag(quantity: Int, fruit: Fruit)
Seq(Bag(1, Apple), Bag(3, Orange)).toDS // <- This fails because it can't find
an encoder for Fruit
{code}
Ideally I'd like to be able to create my encoder where I can tell it, for
example, to use the case object toString method for mapping it to a String
column.
How feasible would it be to expose an API for creating custom encoders?
Unfortunately, not having this limits the capacity for generalised and typesafe
models quite a bit.
Thank you.
> Support user-created custom Encoders for Datasets
> -------------------------------------------------
>
> Key: SPARK-22351
> URL: https://issues.apache.org/jira/browse/SPARK-22351
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 2.2.0
> Reporter: Adamos Loizou
> Priority: Minor
>
> It would be very helpful if we could easily support creating custom encoders
> for classes in Spark SQL.
> This is to allow a user to properly define a business model using types of
> their choice. They can then map them to Spark SQL types without being forced
> to pollute their model with the built-in mappable types (e.g.
> {{java.sql.Timestamp}}).
> Specifically in our case, we tend to use either the Java 8 time API or the
> joda time API for dates instead of {{java.sql.Timestamp}} whose API is quite
> limited compared to the others.
> Ideally we would like to be able to have a dataset of such a class:
> {code:java}
> case class Person(name: String, dateOfBirth: org.joda.time.LocalDate)
> implicit def localDateTimeEncoder: Encoder[LocalDate] = ??? // we define
> something that maps to Spark SQL TimestampType
> ...
> // read csv and map it to model
> val people:Dataset[Person] = spark.read.csv("/my/path/file.csv").as[Person]
> {code}
> While this was possible in Spark 1.6 it's not longer the case in Spark 2.x.
> It's also not straight forward as to how to support that using an
> {{ExpressionEncoder}} (any tips would be much appreciated)
> Thanks.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]