Amanda Liu created SPARK-50541:
----------------------------------

             Summary: Describe Table As JSON
                 Key: SPARK-50541
                 URL: https://issues.apache.org/jira/browse/SPARK-50541
             Project: Spark
          Issue Type: Task
          Components: SQL
    Affects Versions: 4.0.0
            Reporter: Amanda Liu


Support DESCRIBE TABLE ...  [AS JSON] option to display table metadata in JSON 
format. 

 

*Context:*

The Spark SQL command DESCRIBE TABLE  displays table metadata in a DataFrame 
format geared toward human consumption. This format causes parsing challenges, 
e.g. if fields contain special characters or the format changes as new features 
are added. [DBT|https://www.getdbt.com/] is an example customer that motivates 
this proposal, as providing a structured JSON format can help prevent breakages 
in pipelines that depend on parsing table metadata.

 

The new AS JSON  option would return the table metadata as a JSON string that 
supports parsing via machine, while being extensible with a minimized risk of 
breaking changes. It is not meant to be human-readable.

 

*SQL Ref Spec:*

{ DESC | DESCRIBE } [ TABLE ] [ EXTENDED | FORMATTED ] table_name \{ [ 
PARTITION clause ] | [ column_name ] } [ AS JSON ] 

 

*JSON Schema:*

```
{
"table_name": "<table_name>",
"catalog_name": [...],
"database_name": [...],
"qualified_name": "<qualified_name>"
"type": "<table_type>",
"provider": "<provider>",
"columns": [
{
"id": 1,
"name": "<name>",
"type": <type_json>,
"comment": "<comment>",
"default": "<default_val>"
}
],
"partition_values": {
"<col_name>": "<val>"
},
"location": "<path>",
"view_definition": "<view_defn>",
"owner": "<owner>",
"comment": "<comment>",
"table_properties": {
"property1": "<property1>",
"property2": "<property2>"
},
"storage_properties": {
"property1": "<property1>",
"property2": "<property2>"
},
"serde_library": "<serde_library>",
"inputformat": "<input_format>",
"outputformt": "<output_format>",
"bucket_columns": [<col_name>],
"sort_columns": [<col_name>],
"created_time": "<timestamp>",
"last_access": "<timestamp>",
"partition_provider": "<partition_provider>"
}
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to