+1 for option c

On Mon, Sep 10, 2018 at 10:34 AM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> I think I agree with Lars. Having the (mostly) Hive-compatible version is
> useful if there are shared views between Hive/Impala and for people
> migrating queries from Hive or some of the Impala JSON UDFs that I've seen
> floating around.
>
> On Mon, Sep 10, 2018 at 8:20 AM, Lars Volker <l...@cloudera.com> wrote:
>
> > Thanks for the comprehensive summary, Quanlong.
> >
> > I'm in favor of (c) but I don't feel strongly about the order in which we
> > should prioritize them. Being compatible to Hive and following the SQL
> > standard will help users to switch more easily between systems. I'm not
> > worried about the confusing users too much. Even if the functions are
> have
> > similar names, people are not going to confuse them by accident, and we
> can
> > add a warning to the docs to point out that get_json_object should not be
> > confused with the ANSI SQL function json_object.
> >
> > Cheers, Lars
> >
> > On Mon, Sep 10, 2018 at 7:57 AM Quanlong Huang <huangquanl...@gmail.com>
> > wrote:
> >
> >> Hi all,
> >>
> >> Recently, I'm working on a patch to add a built-in function for parsing
> >> JSON strings and extracting values in them: https://gerrit.cloudera.
> >> org/#/c/10950. We've not reached an agreement on the name of this
> >> function. So I reach here for a broader discussion.
> >>
> >> *-- Background --*
> >>
> >> Hive has a built-in function (get_json_object) for this purpose. But
> it's
> >> a Java UDF. Impala can't track its memory usage. The current patch adds
> >> built-in support for this function.
> >>
> >> Greg suggested that we name this function as json_value(). It's a
> >> function in ANSI SQL standard. Note that json_value() only returns
> scalar
> >> types so json_query() in the standard may be more fit.
> >>
> >> The discussion is about which of the following options we should choose
> >> for JSON support:
> >>
> >> (a) Support get_json_object for compatibility to Hive;
> >> (b) Support JSON functions in ANSI SQL standard;
> >> (c) Support both.
> >>
> >> I used to be in favor of (c). But Zoltan points out there's a
> >> json_object() function in ANSI standard with a quite different meaning
> >> which will confuse users with the Hive get_json_object UDF. Maybe we can
> >> only choose (a) or (b).
> >>
> >> *-- SQL supports in other systems --*
> >>
> >> Oracle supports JSON functions in ANSI standard, while MySQL does not.
> >> There's a JSON_EXTRACT function in MySQL contains all the ability of
> Hive's
> >> get_json_object function.
> >>
> >> Presto and Greenplum do not support ANSI JSON standards too:
> >> https://prestodb.io/docs/current/functions/json.html,
> >> https://gpdb.docs.pivotal.io/5100/admin_guide/query/topics/
> json-data.html
> >>
> >> Here're the previous discussions: https://gerrit.
> >> cloudera.org/#/c/10950/13/common/function-registry/
> >> impala_functions.py@514
> >>
> >> What do you think?
> >>
> >> Thanks,
> >> Quanlong
> >>
> >
>

Reply via email to