[ 
https://issues.apache.org/jira/browse/HIVE-20448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated HIVE-20448:
-------------------------------------
    Description: 
Hive's implementation of get_json_object() unnecessarily unnests array values, 
e.g.:
{noformat}
hive> select get_json_object('[{"key": [1,2,3]}, {"key": [4,5]}]', '$[*].key');
OK
[1,2,3,4,5]{noformat}
I think for the above query the following output is more reasonable:
{noformat}
[[1,2,3],[4,5]]{noformat}
This is also the output that one gets from
 * [https://jsonpath.curiousconcept.com/] 
 * [http://jsonpath.com/]
 * [https://jsonpath.herokuapp.com/]

Python's jsonpath package also produces this output.

MySQL's JSON_EXTRACT() also works like that.

I tested it with Hive 2.1.1, but it is likely that it affects other versions as 
well.

If we'd really want to unnest the arrays, then '$[*].key[*]' is the way to go. 
It produces the same output everywhere.

  was:
Hive's implementation of get_json_object() unnecessarily unnests array values, 
e.g.:
{noformat}
hive> select get_json_object('[{"key": [1,2,3]}, {"key": [4,5]}]', '$[*].key');
OK
[1,2,3,4,5]{noformat}
I think for the above query the following output is more reasonable:
{noformat}
[[1,2,3],[4,5]]{noformat}
This is also the output that one gets from
 * [https://jsonpath.curiousconcept.com/] 
 * [http://jsonpath.com/]
 * [https://jsonpath.herokuapp.com/]

Python's jsonpath package also produces this output.

I tested it with Hive 2.1.1, but it is likely that it affects other versions as 
well.

If we'd really want to unnest the arrays, then '$[*].key[*]' is the way to go. 
It produces the same output everywhere.


> get_json_object() unnecessarily unnests array values
> ----------------------------------------------------
>
>                 Key: HIVE-20448
>                 URL: https://issues.apache.org/jira/browse/HIVE-20448
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Zoltán Borók-Nagy
>            Priority: Major
>
> Hive's implementation of get_json_object() unnecessarily unnests array 
> values, e.g.:
> {noformat}
> hive> select get_json_object('[{"key": [1,2,3]}, {"key": [4,5]}]', 
> '$[*].key');
> OK
> [1,2,3,4,5]{noformat}
> I think for the above query the following output is more reasonable:
> {noformat}
> [[1,2,3],[4,5]]{noformat}
> This is also the output that one gets from
>  * [https://jsonpath.curiousconcept.com/] 
>  * [http://jsonpath.com/]
>  * [https://jsonpath.herokuapp.com/]
> Python's jsonpath package also produces this output.
> MySQL's JSON_EXTRACT() also works like that.
> I tested it with Hive 2.1.1, but it is likely that it affects other versions 
> as well.
> If we'd really want to unnest the arrays, then '$[*].key[*]' is the way to 
> go. It produces the same output everywhere.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to