Maxim Gekk created SPARK-24643:
----------------------------------
Summary: from_json should accept an aggregate function as schema
Key: SPARK-24643
URL: https://issues.apache.org/jira/browse/SPARK-24643
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.3.1
Reporter: Maxim Gekk
Currently, the *from_json()* function accepts only string literals as schema:
- Checking of schema argument inside of JsonToStructs:
[https://github.com/apache/spark/blob/b8f27ae3b34134a01998b77db4b7935e7f82a4fe/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L530]
- Accepting only string literal:
[https://github.com/apache/spark/blob/b8f27ae3b34134a01998b77db4b7935e7f82a4fe/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L749-L752]
JsonToStructs should be modified to accept results of aggregate functions like
*infer_schema* (see SPARK-24642). It should be possible to write SQL like:
{code:sql}
select from_json(json_col, infer_schema(json_col)) from json_table
{code}
Here is a test case with existing aggregate function - *first()*:
{code:sql}
create temporary view schemas(schema) as select * from values
('struct<a:int>'),
('map<string,int>');
select from_json('{"a":1}', first(schema)) from schemas;
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]