[ 
https://issues.apache.org/jira/browse/SPARK-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-3569:
---------------------------------
    Description: 
Want to add a metadata field to StructField that can be used by other 
applications like ML to embed more information about the column.

{code}
case class case class StructField(name: String, dataType: DataType, nullable: 
Boolean, metadata: Map[String, Any] = Map.empty)
{code}

For ML, we can store feature information like categorical/continuous, number 
categories, category-to-index map, etc.

One question is how to carry over the metadata in query execution. For example:

{code}
val features = schemaRDD.select('features)
val featuresDesc = features.schema("features").metadata
{code}

  was:
Want to add a metadata field to StructField that can be used by other 
applications like ML to embed more information about the column.

{code}
case class case class StructField(name: String, dataType: DataType, nullable: 
Boolean, metadata: Map[String, Any] = Map.empty)
{code}

For ML, we can store feature information like categorical/continuous, number 
categories, category-to-index map, etc



> Add metadata field to StructField
> ---------------------------------
>
>                 Key: SPARK-3569
>                 URL: https://issues.apache.org/jira/browse/SPARK-3569
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, MLlib, SQL
>            Reporter: Xiangrui Meng
>
> Want to add a metadata field to StructField that can be used by other 
> applications like ML to embed more information about the column.
> {code}
> case class case class StructField(name: String, dataType: DataType, nullable: 
> Boolean, metadata: Map[String, Any] = Map.empty)
> {code}
> For ML, we can store feature information like categorical/continuous, number 
> categories, category-to-index map, etc.
> One question is how to carry over the metadata in query execution. For 
> example:
> {code}
> val features = schemaRDD.select('features)
> val featuresDesc = features.schema("features").metadata
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to