Hi all,

Recently, I'm working on a patch to add a built-in function for parsing
JSON strings and extracting values in them:
https://gerrit.cloudera.org/#/c/10950. We've not reached an agreement on
the name of this function. So I reach here for a broader discussion.

*-- Background --*

Hive has a built-in function (get_json_object) for this purpose. But it's a
Java UDF. Impala can't track its memory usage. The current patch adds
built-in support for this function.

Greg suggested that we name this function as json_value(). It's a function
in ANSI SQL standard. Note that json_value() only returns scalar types so
json_query() in the standard may be more fit.

The discussion is about which of the following options we should choose for
JSON support:

(a) Support get_json_object for compatibility to Hive;
(b) Support JSON functions in ANSI SQL standard;
(c) Support both.

I used to be in favor of (c). But Zoltan points out there's a json_object()
function in ANSI standard with a quite different meaning which will confuse
users with the Hive get_json_object UDF. Maybe we can only choose (a) or
(b).

*-- SQL supports in other systems --*

Oracle supports JSON functions in ANSI standard, while MySQL does not.
There's a JSON_EXTRACT function in MySQL contains all the ability of Hive's
get_json_object function.

Presto and Greenplum do not support ANSI JSON standards too:
https://prestodb.io/docs/current/functions/json.html,
https://gpdb.docs.pivotal.io/5100/admin_guide/query/topics/json-data.html

Here're the previous discussions:
https://gerrit.cloudera.org/#/c/10950/13/common/function-registry/impala_functions.py@514

What do you think?

Thanks,
Quanlong

Reply via email to