Pat McDonough created SPARK-16641:
-------------------------------------
Summary: Add an Option to Create a Dataset With a Case Class,
Ignoring Column Names (Using ordinal instead)
Key: SPARK-16641
URL: https://issues.apache.org/jira/browse/SPARK-16641
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.0.0
Reporter: Pat McDonough
Priority: Minor
When working with a CSV that has no header row, there isn't a concise method to
create a Dataset using a case class. An option to map fields by ordinal rather
than field name would be great.
For example, given the following case class:
{code}
case class Part(partkey: Int, name: String, mfgr: String, brand: String,
_type: String, size: Int, container: String, retailprice: Double, comments:
String)
{code}
I'd like to use the following:
{code}
val parts = spark.read.option("delimiter", "|").option("header", "false")
.csv("dbfs:/databricks-datasets/tpch/data-001/part/").as[Part]
{code}
But that won't work because the field names (_c0, _c1, _c2...) do not match
the Case class field names.
Instead, I end up writing a bunch of extra conversion code in a map function.
{code}
val parts = spark.read.option("delimiter", "|").option("header", "false")
.csv("dbfs:/databricks-datasets/tpch/data-001/part/")
.map(p =>
new part(p.getString(0).trim().toInt, p.getString(1), p.getString(2),
p.getString(3), p.getString(4), p.getString(5).trim().toInt, p.getString(6),
p.getString(7).trim().toDouble, p.getString(8)))
{code}
CC: [~rxin]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]