+1 for option c On Mon, Sep 10, 2018 at 10:34 AM, Tim Armstrong <tarmstr...@cloudera.com> wrote:
> I think I agree with Lars. Having the (mostly) Hive-compatible version is > useful if there are shared views between Hive/Impala and for people > migrating queries from Hive or some of the Impala JSON UDFs that I've seen > floating around. > > On Mon, Sep 10, 2018 at 8:20 AM, Lars Volker <l...@cloudera.com> wrote: > > > Thanks for the comprehensive summary, Quanlong. > > > > I'm in favor of (c) but I don't feel strongly about the order in which we > > should prioritize them. Being compatible to Hive and following the SQL > > standard will help users to switch more easily between systems. I'm not > > worried about the confusing users too much. Even if the functions are > have > > similar names, people are not going to confuse them by accident, and we > can > > add a warning to the docs to point out that get_json_object should not be > > confused with the ANSI SQL function json_object. > > > > Cheers, Lars > > > > On Mon, Sep 10, 2018 at 7:57 AM Quanlong Huang <huangquanl...@gmail.com> > > wrote: > > > >> Hi all, > >> > >> Recently, I'm working on a patch to add a built-in function for parsing > >> JSON strings and extracting values in them: https://gerrit.cloudera. > >> org/#/c/10950. We've not reached an agreement on the name of this > >> function. So I reach here for a broader discussion. > >> > >> *-- Background --* > >> > >> Hive has a built-in function (get_json_object) for this purpose. But > it's > >> a Java UDF. Impala can't track its memory usage. The current patch adds > >> built-in support for this function. > >> > >> Greg suggested that we name this function as json_value(). It's a > >> function in ANSI SQL standard. Note that json_value() only returns > scalar > >> types so json_query() in the standard may be more fit. > >> > >> The discussion is about which of the following options we should choose > >> for JSON support: > >> > >> (a) Support get_json_object for compatibility to Hive; > >> (b) Support JSON functions in ANSI SQL standard; > >> (c) Support both. > >> > >> I used to be in favor of (c). But Zoltan points out there's a > >> json_object() function in ANSI standard with a quite different meaning > >> which will confuse users with the Hive get_json_object UDF. Maybe we can > >> only choose (a) or (b). > >> > >> *-- SQL supports in other systems --* > >> > >> Oracle supports JSON functions in ANSI standard, while MySQL does not. > >> There's a JSON_EXTRACT function in MySQL contains all the ability of > Hive's > >> get_json_object function. > >> > >> Presto and Greenplum do not support ANSI JSON standards too: > >> https://prestodb.io/docs/current/functions/json.html, > >> https://gpdb.docs.pivotal.io/5100/admin_guide/query/topics/ > json-data.html > >> > >> Here're the previous discussions: https://gerrit. > >> cloudera.org/#/c/10950/13/common/function-registry/ > >> impala_functions.py@514 > >> > >> What do you think? > >> > >> Thanks, > >> Quanlong > >> > > >