[
https://issues.apache.org/jira/browse/HIVEMALL-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071902#comment-16071902
]
Charles Pritchard commented on HIVEMALL-62:
-------------------------------------------
Appears to be a duplicate of HIVEMALL-62
> Support a function to convert a comma-separated string into typed data and
> vice versa
> -------------------------------------------------------------------------------------
>
> Key: HIVEMALL-62
> URL: https://issues.apache.org/jira/browse/HIVEMALL-62
> Project: Hivemall
> Issue Type: New Feature
> Reporter: Takeshi Yamamuro
> Priority: Minor
>
> Currently, spark does not have this features (IMO this feature will not
> appear as first-class ones in Spark) it is useful for ETL before ML
> processing.
> e.x.)
> {code}
> scala> val ds1 = Seq("""1,abc""").toDS()
> ds1: org.apache.spark.sql.Dataset[String] = [value: string]
> scala> val schema = new StructType().add("a", IntegerType).add("b",
> StringType)
> schema: org.apache.spark.sql.types.StructType =
> StructType(StructField(a,IntegerType,true), StructField(b,StringType,true))
> scala> val ds2 = ds1.select(from_csv($"value", schema))
> ds2: org.apache.spark.sql.DataFrame = [csvtostruct(value): struct<a: int, b:
> string>]
> scala> ds2.printSchema
> root
> |-- csvtostruct(value): struct (nullable = true)
> | |-- a: integer (nullable = true)
> | |-- b: string (nullable = true)
> scala> ds2.show
> +------------------+
> |csvtostruct(value)|
> +------------------+
> | [1,abc]|
> +------------------+
> {code}
> A related discussion is here:
> https://github.com/apache/spark/pull/13300#issuecomment-261962773
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)