[
https://issues.apache.org/jira/browse/HIVE-20448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zoltán Borók-Nagy updated HIVE-20448:
-------------------------------------
Description:
Hive's implementation of get_json_object() unnecessarily unnests array values,
e.g.:
{noformat}
hive> select get_json_object('[{"key": [1,2,3]}, {"key": [4,5]}]', '$[*].key');
OK
[1,2,3,4,5]{noformat}
I think for the above query the following output is more reasonable:
{noformat}
[[1,2,3],[4,5]]{noformat}
This is also the output that one gets from
* [https://jsonpath.curiousconcept.com/]
* [http://jsonpath.com/]
* [https://jsonpath.herokuapp.com/]
Python's jsonpath package also produces this output.
MySQL's JSON_EXTRACT() also works like that.
I tested it with Hive 2.1.1, but it is likely that it affects other versions as
well.
If we'd really want to unnest the arrays, then '$[*].key[*]' is the way to go.
It produces the same output everywhere.
was:
Hive's implementation of get_json_object() unnecessarily unnests array values,
e.g.:
{noformat}
hive> select get_json_object('[{"key": [1,2,3]}, {"key": [4,5]}]', '$[*].key');
OK
[1,2,3,4,5]{noformat}
I think for the above query the following output is more reasonable:
{noformat}
[[1,2,3],[4,5]]{noformat}
This is also the output that one gets from
* [https://jsonpath.curiousconcept.com/]
* [http://jsonpath.com/]
* [https://jsonpath.herokuapp.com/]
Python's jsonpath package also produces this output.
I tested it with Hive 2.1.1, but it is likely that it affects other versions as
well.
If we'd really want to unnest the arrays, then '$[*].key[*]' is the way to go.
It produces the same output everywhere.
> get_json_object() unnecessarily unnests array values
> ----------------------------------------------------
>
> Key: HIVE-20448
> URL: https://issues.apache.org/jira/browse/HIVE-20448
> Project: Hive
> Issue Type: Bug
> Reporter: Zoltán Borók-Nagy
> Priority: Major
>
> Hive's implementation of get_json_object() unnecessarily unnests array
> values, e.g.:
> {noformat}
> hive> select get_json_object('[{"key": [1,2,3]}, {"key": [4,5]}]',
> '$[*].key');
> OK
> [1,2,3,4,5]{noformat}
> I think for the above query the following output is more reasonable:
> {noformat}
> [[1,2,3],[4,5]]{noformat}
> This is also the output that one gets from
> * [https://jsonpath.curiousconcept.com/]
> * [http://jsonpath.com/]
> * [https://jsonpath.herokuapp.com/]
> Python's jsonpath package also produces this output.
> MySQL's JSON_EXTRACT() also works like that.
> I tested it with Hive 2.1.1, but it is likely that it affects other versions
> as well.
> If we'd really want to unnest the arrays, then '$[*].key[*]' is the way to
> go. It produces the same output everywhere.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)