[ 
https://issues.apache.org/jira/browse/SPARK-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171999#comment-14171999
 ] 

Xiangrui Meng commented on SPARK-3569:
--------------------------------------

I put the design doc here: 
https://docs.google.com/document/d/1RGJgVJhCebnilpL15ODcq0EWBeVjl9ltoHUvosWodPg/edit?usp=sharing

> Add metadata field to StructField
> ---------------------------------
>
>                 Key: SPARK-3569
>                 URL: https://issues.apache.org/jira/browse/SPARK-3569
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib, SQL
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>
> Want to add a metadata field to StructField that can be used by other 
> applications like ML to embed more information about the column.
> {code}
> case class case class StructField(name: String, dataType: DataType, nullable: 
> Boolean, metadata: Map[String, Any] = Map.empty)
> {code}
> For ML, we can store feature information like categorical/continuous, number 
> categories, category-to-index map, etc.
> One question is how to carry over the metadata in query execution. For 
> example:
> {code}
> val features = schemaRDD.select('features)
> val featuresDesc = features.schema('features).metadata
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to