I think I agree with Lars. Having the (mostly) Hive-compatible version is useful if there are shared views between Hive/Impala and for people migrating queries from Hive or some of the Impala JSON UDFs that I've seen floating around.
On Mon, Sep 10, 2018 at 8:20 AM, Lars Volker <l...@cloudera.com> wrote: > Thanks for the comprehensive summary, Quanlong. > > I'm in favor of (c) but I don't feel strongly about the order in which we > should prioritize them. Being compatible to Hive and following the SQL > standard will help users to switch more easily between systems. I'm not > worried about the confusing users too much. Even if the functions are have > similar names, people are not going to confuse them by accident, and we can > add a warning to the docs to point out that get_json_object should not be > confused with the ANSI SQL function json_object. > > Cheers, Lars > > On Mon, Sep 10, 2018 at 7:57 AM Quanlong Huang <huangquanl...@gmail.com> > wrote: > >> Hi all, >> >> Recently, I'm working on a patch to add a built-in function for parsing >> JSON strings and extracting values in them: https://gerrit.cloudera. >> org/#/c/10950. We've not reached an agreement on the name of this >> function. So I reach here for a broader discussion. >> >> *-- Background --* >> >> Hive has a built-in function (get_json_object) for this purpose. But it's >> a Java UDF. Impala can't track its memory usage. The current patch adds >> built-in support for this function. >> >> Greg suggested that we name this function as json_value(). It's a >> function in ANSI SQL standard. Note that json_value() only returns scalar >> types so json_query() in the standard may be more fit. >> >> The discussion is about which of the following options we should choose >> for JSON support: >> >> (a) Support get_json_object for compatibility to Hive; >> (b) Support JSON functions in ANSI SQL standard; >> (c) Support both. >> >> I used to be in favor of (c). But Zoltan points out there's a >> json_object() function in ANSI standard with a quite different meaning >> which will confuse users with the Hive get_json_object UDF. Maybe we can >> only choose (a) or (b). >> >> *-- SQL supports in other systems --* >> >> Oracle supports JSON functions in ANSI standard, while MySQL does not. >> There's a JSON_EXTRACT function in MySQL contains all the ability of Hive's >> get_json_object function. >> >> Presto and Greenplum do not support ANSI JSON standards too: >> https://prestodb.io/docs/current/functions/json.html, >> https://gpdb.docs.pivotal.io/5100/admin_guide/query/topics/json-data.html >> >> Here're the previous discussions: https://gerrit. >> cloudera.org/#/c/10950/13/common/function-registry/ >> impala_functions.py@514 >> >> What do you think? >> >> Thanks, >> Quanlong >> >