[
https://issues.apache.org/jira/browse/FLINK-32115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
xiaogang zhou updated FLINK-32115:
----------------------------------
Description:
[https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
hive support json object cache for previous deserialized value, could we
consider use a cache objects in JsonValueCallGen?
This optimize can improve performance of SQL like
select
json_value(A, 'xxx'),
json_value(A, 'yyy'),
json_value(A, 'zzz'),
...
a lot
I added a static LRU cache into SqlJsonUtils, and refactor the
jsonValueExpression1 like
{code:java}
private static JsonValueContext jsonValueExpression1(String input) {
JsonValueContext parsedJsonContext = EXTRACT_OBJECT_CACHE.get(input);
if (parsedJsonContext != null) {
return parsedJsonContext;
}
try {
parsedJsonContext = JsonValueContext.withJavaObj(dejsonize(input));
} catch (Exception e) {
parsedJsonContext = JsonValueContext.withException(e);
}
EXTRACT_OBJECT_CACHE.put(input, parsedJsonContext);
return parsedJsonContext;
} {code}
and benchmarked like:
{code:java}
public static void main(String[] args) {
String input =
"{\"social\":[{\"weibo\":\"https://weibo.com/xiaoming\"},{\"github\":\"https://github.com/xiaoming\"}]}";
Long start = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++) {
Object dejsonize = jsonValueExpression1(input);
}
System.err.println(System.currentTimeMillis() - start);
} {code}
time 2 benchmark takes is:
||case||milli second taken||
|cache|33|
|no cache|1591|
was:
[https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
hive support json object cache for previous deserialized value, could we
consider use a cache objects in JsonValueCallGen?
This optimize can improve performance of SQL like
select
json_value(A, 'xxx'),
json_value(A, 'yyy'),
json_value(A, 'zzz'),
...
a lot
I added a static LRU cache into SqlJsonUtils, and refactor the
jsonValueExpression1 like
{code:java}
private static JsonValueContext jsonValueExpression1(String input) {
JsonValueContext parsedJsonContext = EXTRACT_OBJECT_CACHE.get(input);
if (parsedJsonContext != null) {
return parsedJsonContext;
}
try {
parsedJsonContext = JsonValueContext.withJavaObj(dejsonize(input));
} catch (Exception e) {
parsedJsonContext = JsonValueContext.withException(e);
}
EXTRACT_OBJECT_CACHE.put(input, parsedJsonContext);
return parsedJsonContext;
} {code}
and benchmarked like:
{code:java}
public static void main(String[] args) {
String input =
"{\"social\":[{\"weibo\":\"https://weibo.com/xiaoming\"},{\"github\":\"https://github.com/xiaoming\"}]}";
Long start = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++) {
Object dejsonize = jsonValueExpression1(input);
}
System.err.println(System.currentTimeMillis() - start);
} {code}
time 2 benchmark takes is:
||case||milli second taken||
|cache|33|
|no cache|1591|
I
> json_value support cache
> ------------------------
>
> Key: FLINK-32115
> URL: https://issues.apache.org/jira/browse/FLINK-32115
> Project: Flink
> Issue Type: Improvement
> Components: Table SQL / Runtime
> Affects Versions: 1.16.1
> Reporter: xiaogang zhou
> Priority: Major
>
> [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
>
> hive support json object cache for previous deserialized value, could we
> consider use a cache objects in JsonValueCallGen?
>
> This optimize can improve performance of SQL like
>
> select
> json_value(A, 'xxx'),
> json_value(A, 'yyy'),
> json_value(A, 'zzz'),
> ...
> a lot
>
> I added a static LRU cache into SqlJsonUtils, and refactor the
> jsonValueExpression1 like
> {code:java}
> private static JsonValueContext jsonValueExpression1(String input) {
> JsonValueContext parsedJsonContext = EXTRACT_OBJECT_CACHE.get(input);
> if (parsedJsonContext != null) {
> return parsedJsonContext;
> }
> try {
> parsedJsonContext = JsonValueContext.withJavaObj(dejsonize(input));
> } catch (Exception e) {
> parsedJsonContext = JsonValueContext.withException(e);
> }
> EXTRACT_OBJECT_CACHE.put(input, parsedJsonContext);
> return parsedJsonContext;
> } {code}
>
> and benchmarked like:
> {code:java}
> public static void main(String[] args) {
> String input =
> "{\"social\":[{\"weibo\":\"https://weibo.com/xiaoming\"},{\"github\":\"https://github.com/xiaoming\"}]}";
> Long start = System.currentTimeMillis();
> for (int i = 0; i < 1000000; i++) {
> Object dejsonize = jsonValueExpression1(input);
> }
> System.err.println(System.currentTimeMillis() - start);
> } {code}
>
> time 2 benchmark takes is:
> ||case||milli second taken||
> |cache|33|
> |no cache|1591|
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)