Re: JSON support: Hive compatibility or ANSI SQL standard

Quanlong Huang Fri, 21 Sep 2018 19:38:35 -0700

Thanks for your reply, friends!

Base on the discussion, we'll choose option c. I'll continue to implement
Hive's get_json_object function in IMPALA-376. For implementing the ANSI
JSON functions, I created a new JIRA (IMPALA-7609) to track it.


If anyone has more comments, feel free to reply in this thread or in the
JIRAs.

Thanks,
Quanlong

On Tue, Sep 11, 2018 at 12:29 AM Brock Noland <[email protected]> wrote:

> +1 for option c
>
> On Mon, Sep 10, 2018 at 10:34 AM, Tim Armstrong <[email protected]>
> wrote:
>
>> I think I agree with Lars. Having the (mostly) Hive-compatible version is
>> useful if there are shared views between Hive/Impala and for people
>> migrating queries from Hive or some of the Impala JSON UDFs that I've seen
>> floating around.
>>
>> On Mon, Sep 10, 2018 at 8:20 AM, Lars Volker <[email protected]> wrote:
>>
>> > Thanks for the comprehensive summary, Quanlong.
>> >
>> > I'm in favor of (c) but I don't feel strongly about the order in which
>> we
>> > should prioritize them. Being compatible to Hive and following the SQL
>> > standard will help users to switch more easily between systems. I'm not
>> > worried about the confusing users too much. Even if the functions are
>> have
>> > similar names, people are not going to confuse them by accident, and we
>> can
>> > add a warning to the docs to point out that get_json_object should not
>> be
>> > confused with the ANSI SQL function json_object.
>> >
>> > Cheers, Lars
>> >
>> > On Mon, Sep 10, 2018 at 7:57 AM Quanlong Huang <[email protected]
>> >
>> > wrote:
>> >
>> >> Hi all,
>> >>
>> >> Recently, I'm working on a patch to add a built-in function for parsing
>> >> JSON strings and extracting values in them: https://gerrit.cloudera.
>> >> org/#/c/10950. We've not reached an agreement on the name of this
>> >> function. So I reach here for a broader discussion.
>> >>
>> >> *-- Background --*
>> >>
>> >> Hive has a built-in function (get_json_object) for this purpose. But
>> it's
>> >> a Java UDF. Impala can't track its memory usage. The current patch adds
>> >> built-in support for this function.
>> >>
>> >> Greg suggested that we name this function as json_value(). It's a
>> >> function in ANSI SQL standard. Note that json_value() only returns
>> scalar
>> >> types so json_query() in the standard may be more fit.
>> >>
>> >> The discussion is about which of the following options we should choose
>> >> for JSON support:
>> >>
>> >> (a) Support get_json_object for compatibility to Hive;
>> >> (b) Support JSON functions in ANSI SQL standard;
>> >> (c) Support both.
>> >>
>> >> I used to be in favor of (c). But Zoltan points out there's a
>> >> json_object() function in ANSI standard with a quite different meaning
>> >> which will confuse users with the Hive get_json_object UDF. Maybe we
>> can
>> >> only choose (a) or (b).
>> >>
>> >> *-- SQL supports in other systems --*
>> >>
>> >> Oracle supports JSON functions in ANSI standard, while MySQL does not.
>> >> There's a JSON_EXTRACT function in MySQL contains all the ability of
>> Hive's
>> >> get_json_object function.
>> >>
>> >> Presto and Greenplum do not support ANSI JSON standards too:
>> >> https://prestodb.io/docs/current/functions/json.html,
>> >>
>> https://gpdb.docs.pivotal.io/5100/admin_guide/query/topics/json-data.html
>> >>
>> >> Here're the previous discussions: https://gerrit.
>> >> cloudera.org/#/c/10950/13/common/function-registry/
>> >> impala_functions.py@514
>> >>
>> >> What do you think?
>> >>
>> >> Thanks,
>> >> Quanlong
>> >>
>> >
>>
>
>

Re: JSON support: Hive compatibility or ANSI SQL standard

Reply via email to