Thanks for your reply, friends! Base on the discussion, we'll choose option c. I'll continue to implement Hive's get_json_object function in IMPALA-376. For implementing the ANSI JSON functions, I created a new JIRA (IMPALA-7609) to track it.
If anyone has more comments, feel free to reply in this thread or in the JIRAs. Thanks, Quanlong On Tue, Sep 11, 2018 at 12:29 AM Brock Noland <br...@phdata.io> wrote: > +1 for option c > > On Mon, Sep 10, 2018 at 10:34 AM, Tim Armstrong <tarmstr...@cloudera.com> > wrote: > >> I think I agree with Lars. Having the (mostly) Hive-compatible version is >> useful if there are shared views between Hive/Impala and for people >> migrating queries from Hive or some of the Impala JSON UDFs that I've seen >> floating around. >> >> On Mon, Sep 10, 2018 at 8:20 AM, Lars Volker <l...@cloudera.com> wrote: >> >> > Thanks for the comprehensive summary, Quanlong. >> > >> > I'm in favor of (c) but I don't feel strongly about the order in which >> we >> > should prioritize them. Being compatible to Hive and following the SQL >> > standard will help users to switch more easily between systems. I'm not >> > worried about the confusing users too much. Even if the functions are >> have >> > similar names, people are not going to confuse them by accident, and we >> can >> > add a warning to the docs to point out that get_json_object should not >> be >> > confused with the ANSI SQL function json_object. >> > >> > Cheers, Lars >> > >> > On Mon, Sep 10, 2018 at 7:57 AM Quanlong Huang <huangquanl...@gmail.com >> > >> > wrote: >> > >> >> Hi all, >> >> >> >> Recently, I'm working on a patch to add a built-in function for parsing >> >> JSON strings and extracting values in them: https://gerrit.cloudera. >> >> org/#/c/10950. We've not reached an agreement on the name of this >> >> function. So I reach here for a broader discussion. >> >> >> >> *-- Background --* >> >> >> >> Hive has a built-in function (get_json_object) for this purpose. But >> it's >> >> a Java UDF. Impala can't track its memory usage. The current patch adds >> >> built-in support for this function. >> >> >> >> Greg suggested that we name this function as json_value(). It's a >> >> function in ANSI SQL standard. Note that json_value() only returns >> scalar >> >> types so json_query() in the standard may be more fit. >> >> >> >> The discussion is about which of the following options we should choose >> >> for JSON support: >> >> >> >> (a) Support get_json_object for compatibility to Hive; >> >> (b) Support JSON functions in ANSI SQL standard; >> >> (c) Support both. >> >> >> >> I used to be in favor of (c). But Zoltan points out there's a >> >> json_object() function in ANSI standard with a quite different meaning >> >> which will confuse users with the Hive get_json_object UDF. Maybe we >> can >> >> only choose (a) or (b). >> >> >> >> *-- SQL supports in other systems --* >> >> >> >> Oracle supports JSON functions in ANSI standard, while MySQL does not. >> >> There's a JSON_EXTRACT function in MySQL contains all the ability of >> Hive's >> >> get_json_object function. >> >> >> >> Presto and Greenplum do not support ANSI JSON standards too: >> >> https://prestodb.io/docs/current/functions/json.html, >> >> >> https://gpdb.docs.pivotal.io/5100/admin_guide/query/topics/json-data.html >> >> >> >> Here're the previous discussions: https://gerrit. >> >> cloudera.org/#/c/10950/13/common/function-registry/ >> >> impala_functions.py@514 >> >> >> >> What do you think? >> >> >> >> Thanks, >> >> Quanlong >> >> >> > >> > >