I think I agree with Lars. Having the (mostly) Hive-compatible version is
useful if there are shared views between Hive/Impala and for people
migrating queries from Hive or some of the Impala JSON UDFs that I've seen
floating around.

On Mon, Sep 10, 2018 at 8:20 AM, Lars Volker <l...@cloudera.com> wrote:

> Thanks for the comprehensive summary, Quanlong.
>
> I'm in favor of (c) but I don't feel strongly about the order in which we
> should prioritize them. Being compatible to Hive and following the SQL
> standard will help users to switch more easily between systems. I'm not
> worried about the confusing users too much. Even if the functions are have
> similar names, people are not going to confuse them by accident, and we can
> add a warning to the docs to point out that get_json_object should not be
> confused with the ANSI SQL function json_object.
>
> Cheers, Lars
>
> On Mon, Sep 10, 2018 at 7:57 AM Quanlong Huang <huangquanl...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> Recently, I'm working on a patch to add a built-in function for parsing
>> JSON strings and extracting values in them: https://gerrit.cloudera.
>> org/#/c/10950. We've not reached an agreement on the name of this
>> function. So I reach here for a broader discussion.
>>
>> *-- Background --*
>>
>> Hive has a built-in function (get_json_object) for this purpose. But it's
>> a Java UDF. Impala can't track its memory usage. The current patch adds
>> built-in support for this function.
>>
>> Greg suggested that we name this function as json_value(). It's a
>> function in ANSI SQL standard. Note that json_value() only returns scalar
>> types so json_query() in the standard may be more fit.
>>
>> The discussion is about which of the following options we should choose
>> for JSON support:
>>
>> (a) Support get_json_object for compatibility to Hive;
>> (b) Support JSON functions in ANSI SQL standard;
>> (c) Support both.
>>
>> I used to be in favor of (c). But Zoltan points out there's a
>> json_object() function in ANSI standard with a quite different meaning
>> which will confuse users with the Hive get_json_object UDF. Maybe we can
>> only choose (a) or (b).
>>
>> *-- SQL supports in other systems --*
>>
>> Oracle supports JSON functions in ANSI standard, while MySQL does not.
>> There's a JSON_EXTRACT function in MySQL contains all the ability of Hive's
>> get_json_object function.
>>
>> Presto and Greenplum do not support ANSI JSON standards too:
>> https://prestodb.io/docs/current/functions/json.html,
>> https://gpdb.docs.pivotal.io/5100/admin_guide/query/topics/json-data.html
>>
>> Here're the previous discussions: https://gerrit.
>> cloudera.org/#/c/10950/13/common/function-registry/
>> impala_functions.py@514
>>
>> What do you think?
>>
>> Thanks,
>> Quanlong
>>
>

Reply via email to