[ https://issues.apache.org/jira/browse/SPARK-22351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287735#comment-16287735 ]
Adamos Loizou commented on SPARK-22351: --------------------------------------- Hello guys, once more I've run against this problem now with ADT/Sealed hierarchies examples. For reference, there are already people facing this issue ([stack overflow link|https://stackoverflow.com/questions/41030073/encode-an-adt-sealed-trait-hierarchy-into-spark-dataset-column]). Here is an example: {code:java} sealed trait Fruit case object Apple extends Fruit case object Orange extends Fruit case class Bag(quantity: Int, fruit: Fruit) Seq(Bag(1, Apple), Bag(3, Orange)).toDS // <- This fails because it can't find an encoder for Fruit {code} Ideally I'd like to be able to create my encoder where I can tell it, for example, to use the case object toString method for mapping it to a String column. How feasible would it be to expose an API for creating custom encoders? Unfortunately, not having this limits the capacity for generalised and typesafe models quite a bit. Thank you. > Support user-created custom Encoders for Datasets > ------------------------------------------------- > > Key: SPARK-22351 > URL: https://issues.apache.org/jira/browse/SPARK-22351 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 2.2.0 > Reporter: Adamos Loizou > Priority: Minor > > It would be very helpful if we could easily support creating custom encoders > for classes in Spark SQL. > This is to allow a user to properly define a business model using types of > their choice. They can then map them to Spark SQL types without being forced > to pollute their model with the built-in mappable types (e.g. > {{java.sql.Timestamp}}). > Specifically in our case, we tend to use either the Java 8 time API or the > joda time API for dates instead of {{java.sql.Timestamp}} whose API is quite > limited compared to the others. > Ideally we would like to be able to have a dataset of such a class: > {code:java} > case class Person(name: String, dateOfBirth: org.joda.time.LocalDate) > implicit def localDateTimeEncoder: Encoder[LocalDate] = ??? // we define > something that maps to Spark SQL TimestampType > ... > // read csv and map it to model > val people:Dataset[Person] = spark.read.csv("/my/path/file.csv").as[Person] > {code} > While this was possible in Spark 1.6 it's not longer the case in Spark 2.x. > It's also not straight forward as to how to support that using an > {{ExpressionEncoder}} (any tips would be much appreciated) > Thanks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org