[jira] [Updated] (FLINK-32115) json_value support cache

xiaogang zhou (Jira) Wed, 17 May 2023 02:13:08 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-32115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


xiaogang zhou updated FLINK-32115:
----------------------------------
    Description: 
[https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]

 

hive support json object cache for previous deserialized value, could we 
consider use a cache objects in JsonValueCallGen? 

 

This optimize can improve performance of SQL like

 

select 

json_value(A, 'xxx'),

json_value(A, 'yyy'),

json_value(A, 'zzz'),

...

a lot

 

 

I have tested it with SQL like (keys are replaced for security reason)

 

insert into blackhole 
select

   JSON_VALUE(`message`,'$.a') as scene,
   JSON_VALUE(`message`,'$.b') as screen_height,
   JSON_VALUE(`message`,'$.c') as longitude,
   JSON_VALUE(`message`,'$.d') as device_id,
   JSON_VALUE(`message`,'$.e') as receive_time,
   JSON_VALUE(`message`,'$.f') as app_build,
   JSON_VALUE(`message`,'$.g') as track_id,
   JSON_VALUE(`message`,'$.h') as distinct_id
from xxx;

 

the cached UDF is about 2 times the speed the nocache UDF do.

  was:
[https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]

 

hive support json object cache for previous deserialized value, could we 
consider use a cache objects in JsonValueCallGen? 

 

This optimize can improve performance of SQL like

 

select 

json_value(A, 'xxx'),

json_value(A, 'yyy'),

json_value(A, 'zzz'),

...

a lot


> json_value support cache
> ------------------------
>
>                 Key: FLINK-32115
>                 URL: https://issues.apache.org/jira/browse/FLINK-32115
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Runtime
>    Affects Versions: 1.16.1
>            Reporter: xiaogang zhou
>            Priority: Major
>
> [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
>  
> hive support json object cache for previous deserialized value, could we 
> consider use a cache objects in JsonValueCallGen? 
>  
> This optimize can improve performance of SQL like
>  
> select 
> json_value(A, 'xxx'),
> json_value(A, 'yyy'),
> json_value(A, 'zzz'),
> ...
> a lot
>  
>  
> I have tested it with SQL like (keys are replaced for security reason)
>  
> insert into blackhole 
> select
>    JSON_VALUE(`message`,'$.a') as scene,
>    JSON_VALUE(`message`,'$.b') as screen_height,
>    JSON_VALUE(`message`,'$.c') as longitude,
>    JSON_VALUE(`message`,'$.d') as device_id,
>    JSON_VALUE(`message`,'$.e') as receive_time,
>    JSON_VALUE(`message`,'$.f') as app_build,
>    JSON_VALUE(`message`,'$.g') as track_id,
>    JSON_VALUE(`message`,'$.h') as distinct_id
> from xxx;
>  
> the cached UDF is about 2 times the speed the nocache UDF do.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-32115) json_value support cache

Reply via email to