[
https://issues.apache.org/jira/browse/FLINK-32115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
xiaogang zhou updated FLINK-32115:
----------------------------------
Description:
[https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
hive support json object cache for previous deserialized value, could we
consider use a cache objects in JsonValueCallGen?
This optimize can improve performance of SQL like
select
json_value(A, 'xxx'),
json_value(A, 'yyy'),
json_value(A, 'zzz'),
...
a lot
I have tested it with SQL like (keys are replaced for security reason)
insert into blackhole
select
JSON_VALUE(`message`,'$.a') as scene,
JSON_VALUE(`message`,'$.b') as screen_height,
JSON_VALUE(`message`,'$.c') as longitude,
JSON_VALUE(`message`,'$.d') as device_id,
JSON_VALUE(`message`,'$.e') as receive_time,
JSON_VALUE(`message`,'$.f') as app_build,
JSON_VALUE(`message`,'$.g') as track_id,
JSON_VALUE(`message`,'$.h') as distinct_id
from xxx;
the cached UDF is about 2 times the speed the nocache UDF do.
was:
[https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
hive support json object cache for previous deserialized value, could we
consider use a cache objects in JsonValueCallGen?
This optimize can improve performance of SQL like
select
json_value(A, 'xxx'),
json_value(A, 'yyy'),
json_value(A, 'zzz'),
...
a lot
> json_value support cache
> ------------------------
>
> Key: FLINK-32115
> URL: https://issues.apache.org/jira/browse/FLINK-32115
> Project: Flink
> Issue Type: Improvement
> Components: Table SQL / Runtime
> Affects Versions: 1.16.1
> Reporter: xiaogang zhou
> Priority: Major
>
> [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
>
> hive support json object cache for previous deserialized value, could we
> consider use a cache objects in JsonValueCallGen?
>
> This optimize can improve performance of SQL like
>
> select
> json_value(A, 'xxx'),
> json_value(A, 'yyy'),
> json_value(A, 'zzz'),
> ...
> a lot
>
>
> I have tested it with SQL like (keys are replaced for security reason)
>
> insert into blackhole
> select
> JSON_VALUE(`message`,'$.a') as scene,
> JSON_VALUE(`message`,'$.b') as screen_height,
> JSON_VALUE(`message`,'$.c') as longitude,
> JSON_VALUE(`message`,'$.d') as device_id,
> JSON_VALUE(`message`,'$.e') as receive_time,
> JSON_VALUE(`message`,'$.f') as app_build,
> JSON_VALUE(`message`,'$.g') as track_id,
> JSON_VALUE(`message`,'$.h') as distinct_id
> from xxx;
>
> the cached UDF is about 2 times the speed the nocache UDF do.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)