[
https://issues.apache.org/jira/browse/SPARK-20384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Davis updated SPARK-20384:
---------------------------------
Description:
As a spark user who uses value classes in scala for modelling domain objects, I
also would like to make use of them for datasets.
For example, I would like to use the {{User}} case class which is using a
value-class for it's {{id}} as the type for a DataSet:
- the underlying primitive should be mapped to the value-class column
- function on the column (for example comparison ) should only work if defined
on the value-class and use these implementation
- show() should pick up the toString method of the value-class
{code}
case class Id(value: Long) extends AnyVal {
def toString: String = value.toHexString
}
case class User(id: Id, name: String)
spark.sparkContext
.parallelize(0L to 12L).map(i => (i, f"name-$i")).toDS()
.withColumnRenamed("_1", "id")
.withColumnRenamed("_2", "name")
.as[User].show()
{code}
expected output:
{noformat}
+---+-------+
| id| name|
+---+-------+
| 0| name-0|
| 1| name-1|
| 2| name-2|
| 3| name-3|
| 4| name-4|
| 5| name-5|
| 6| name-6|
| 7| name-7|
| 8| name-8|
| 9| name-9|
| A|name-10|
| B|name-11|
| C|name-12|
+---+-------+
{noformat}
was:
As a spark user who uses value classes in scala for modelling domain objects, I
also would like to make use of them for datasets.
For example, I would like to use the {{User}} case class which is using a
value-class for it's {{id}} as the type for a DataSet:
- the underlying primitive should be mapped to the value-class column
- function on the column (for example comparison ) should only work if defined
on the value-class and use these implementation
- show() should pick up the toString method of the value-class
{code}
case class Id(value: Long) extends AnyVal {
def toString: String = value.toHexString
}
case class User(id: Id, name: String)
spark.sparkContext
.parallelize(0L to 10L).map(i => (i, f"name-$i")).toDS()
.withColumnRenamed("_1", "id")
.withColumnRenamed("_2", "name").as[User]
{code}
> value classes on primitives in DataSets
> ---------------------------------------
>
> Key: SPARK-20384
> URL: https://issues.apache.org/jira/browse/SPARK-20384
> Project: Spark
> Issue Type: Improvement
> Components: Optimizer, SQL
> Affects Versions: 2.1.0
> Reporter: Daniel Davis
> Priority: Minor
>
> As a spark user who uses value classes in scala for modelling domain objects,
> I also would like to make use of them for datasets.
> For example, I would like to use the {{User}} case class which is using a
> value-class for it's {{id}} as the type for a DataSet:
> - the underlying primitive should be mapped to the value-class column
> - function on the column (for example comparison ) should only work if
> defined on the value-class and use these implementation
> - show() should pick up the toString method of the value-class
> {code}
> case class Id(value: Long) extends AnyVal {
> def toString: String = value.toHexString
> }
> case class User(id: Id, name: String)
> spark.sparkContext
> .parallelize(0L to 12L).map(i => (i, f"name-$i")).toDS()
> .withColumnRenamed("_1", "id")
> .withColumnRenamed("_2", "name")
> .as[User].show()
> {code}
> expected output:
> {noformat}
> +---+-------+
> | id| name|
> +---+-------+
> | 0| name-0|
> | 1| name-1|
> | 2| name-2|
> | 3| name-3|
> | 4| name-4|
> | 5| name-5|
> | 6| name-6|
> | 7| name-7|
> | 8| name-8|
> | 9| name-9|
> | A|name-10|
> | B|name-11|
> | C|name-12|
> +---+-------+
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]