[jira] [Updated] (SPARK-20384) value classes on primitives in DataSets

Daniel Davis (JIRA) Wed, 19 Apr 2017 02:56:52 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-20384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Daniel Davis updated SPARK-20384:
---------------------------------
    Description: 
As a spark user who uses value classes in scala for modelling domain objects, I 
also would like to make use of them for datasets. 

For example, I would like to use the {{User}} case class which is using a 
value-class for it's {{id}} as the type for a DataSet:
- the underlying primitive should be mapped to the value-class column
- function on the column (for example comparison ) should only work if defined 
on the value-class and use these implementation
- show() should pick up the toString method of the value-class

{code}
case class Id(value: Long) extends AnyVal {
  def toString: String = value.toHexString
}
case class User(id: Id, name: String)

spark.sparkContext
  .parallelize(0L to 12L).map(i => (i, f"name-$i")).toDS()
  .withColumnRenamed("_1", "id")
  .withColumnRenamed("_2", "name")
  .as[User].show()
{code}

expected output:
{noformat}
+---+-------+
| id|   name|
+---+-------+
|  0| name-0|
|  1| name-1|
|  2| name-2|
|  3| name-3|
|  4| name-4|
|  5| name-5|
|  6| name-6|
|  7| name-7|
|  8| name-8|
|  9| name-9|
|  A|name-10|
|  B|name-11|
|  C|name-12|
+---+-------+
{noformat}

  was:
As a spark user who uses value classes in scala for modelling domain objects, I 
also would like to make use of them for datasets. 

For example, I would like to use the {{User}} case class which is using a 
value-class for it's {{id}} as the type for a DataSet:
- the underlying primitive should be mapped to the value-class column
- function on the column (for example comparison ) should only work if defined 
on the value-class and use these implementation
- show() should pick up the toString method of the value-class

{code}
case class Id(value: Long) extends AnyVal {
  def toString: String = value.toHexString
}
case class User(id: Id, name: String)

spark.sparkContext
  .parallelize(0L to 10L).map(i => (i, f"name-$i")).toDS()
  .withColumnRenamed("_1", "id")
  .withColumnRenamed("_2", "name").as[User]
{code}


> value classes on primitives in DataSets
> ---------------------------------------
>
>                 Key: SPARK-20384
>                 URL: https://issues.apache.org/jira/browse/SPARK-20384
>             Project: Spark
>          Issue Type: Improvement
>          Components: Optimizer, SQL
>    Affects Versions: 2.1.0
>            Reporter: Daniel Davis
>            Priority: Minor
>
> As a spark user who uses value classes in scala for modelling domain objects, 
> I also would like to make use of them for datasets. 
> For example, I would like to use the {{User}} case class which is using a 
> value-class for it's {{id}} as the type for a DataSet:
> - the underlying primitive should be mapped to the value-class column
> - function on the column (for example comparison ) should only work if 
> defined on the value-class and use these implementation
> - show() should pick up the toString method of the value-class
> {code}
> case class Id(value: Long) extends AnyVal {
>   def toString: String = value.toHexString
> }
> case class User(id: Id, name: String)
> spark.sparkContext
>   .parallelize(0L to 12L).map(i => (i, f"name-$i")).toDS()
>   .withColumnRenamed("_1", "id")
>   .withColumnRenamed("_2", "name")
>   .as[User].show()
> {code}
> expected output:
> {noformat}
> +---+-------+
> | id|   name|
> +---+-------+
> |  0| name-0|
> |  1| name-1|
> |  2| name-2|
> |  3| name-3|
> |  4| name-4|
> |  5| name-5|
> |  6| name-6|
> |  7| name-7|
> |  8| name-8|
> |  9| name-9|
> |  A|name-10|
> |  B|name-11|
> |  C|name-12|
> +---+-------+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-20384) value classes on primitives in DataSets

Reply via email to