[
https://issues.apache.org/jira/browse/CALCITE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
neoremind updated CALCITE-3873:
-------------------------------
Description:
For curiosity, I use flame graph to profiling a simple query. The code snippet
looks like below.
{code:java}
String sql = "select empno, gender, name from EMPS where name = 'John'";
Connection connection = null;
Statement statement = null;
try {
Properties info = new Properties();
info.put("model", jsonPath("smart"));
connection = DriverManager.getConnection("jdbc:calcite:", info);
String x = null;
long start = System.currentTimeMillis();
for (int i = 0; i < 50000; i++) {
statement = connection.createStatement();
final ResultSet resultSet =
statement.executeQuery(
sql);
while (resultSet.next()) {
x = resultSet.getInt(1)
+ resultSet.getString(2)
+ resultSet.getString(3);
} }
} catch (SQLException e) {
e.printStackTrace();
} finally {
close(connection, statement);
}
{code}
I attach the generated flame graph [^pic1.svg]
{code:java}
3% on sql2rel
9% on query optimizing,
62% of the time is spent on code gen and implementation,
20% on result set iterating and checking,
…
{code}
Hope this graph is informative. Since I start to learn Calcite recently, I
cannot tell where to start tuning, but from the graph one tiny point catches my
attention, I find there are many reflection invocations in
_Prepare#trimUnusedFields_. So, I spent some time trying to mitigate the small
overhead.
I optimize _ReflectiveVisitDispatcher_ by introducing a global _Guava_ cache
with limited size to cache methods, also I add full unit tests for
_ReflectUtil_.
I count the reference of the method: _ReflectUtil#createMethodDispatcher and_
_ReflectUtil#createDispatcher (see below)._ Total 68 possible invocations, so
the cache size is limited, by caching all the methods during the lifecycle of
the process, we can eliminate reflection looking up methods overhead.
{code:java}
org.apache.calcite.rel.rel2sql.RelToSqlConverter: 18 possible invocations.
org.apache.calcite.sql2rel.RelDecorrelator: 15 possible invocations.
org.apache.calcite.sql2rel.RelFieldTrimmer: 11 possible invocations.
org.apache.calcite.sql2rel.RelStructuredTypeFlattener.RewriteRelVisitor: 22
possible invocations.
org.apache.calcite.interpreter static class Interpreter.CompilerImpl: 2
possible invocations.
{code}
Before introducing the global caching, caching is shared per
_ReflectiveVisitDispatcher_ instance, now different _ReflectiveVisitDispatcher_
in different thread is able to reuse the cached methods.
See [^pic2.svg], after tuning, _trimUnusedFields_ only takes 0.64% of the
sampling time compared with 1.38% previously. I think this will help in a lot
more places.
was:
For curiosity, I use flame graph to profiling a simple query. The code snippet
looks like below.
{code:java}
String sql = "select empno, gender, name from EMPS where name = 'John'";
Connection connection = null;
Statement statement = null;
try {
Properties info = new Properties();
info.put("model", jsonPath("smart"));
connection = DriverManager.getConnection("jdbc:calcite:", info);
String x = null;
long start = System.currentTimeMillis();
for (int i = 0; i < 50000; i++) {
statement = connection.createStatement();
final ResultSet resultSet =
statement.executeQuery(
sql);
while (resultSet.next()) {
x = resultSet.getInt(1)
+ resultSet.getString(2)
+ resultSet.getString(3);
} }
} catch (SQLException e) {
e.printStackTrace();
} finally {
close(connection, statement);
}
{code}
I attach the generated flame graph [^pic1.svg]
{code:java}
3% on sql2rel
9% on query optimizing,
62% of the time is spent on code gen and implementation,
20% on result set iterating and checking,
…
{code}
Hope this graph is informative. Since I start to learn Calcite recently, I
cannot tell where to start tuning, but from the graph one tiny point catches my
attention, I find there are many reflection invocations in
_Prepare#trimUnusedFields_. So, I spent some time trying to mitigate the small
overhead.
I optimize _ReflectiveVisitDispatcher_ by introducing a global _Guava_ cache
with limited size to cache methods, also I add full unit tests for
_ReflectUtil_.
I count the reference of the method: _ReflectUtil#createMethodDispatcher and_
_ReflectUtil#createDispatcher (see below)._ Total 68 possible invocations, so
the cache size is limited, by caching all the methods during the lifecycle of
the process, we can eliminate reflection looking up methods overhead.
{code:java}
org.apache.calcite.rel.rel2sql.RelToSqlConverter: 18 possible invocations.
org.apache.calcite.sql2rel.RelDecorrelator: 15 possible invocations.
org.apache.calcite.sql2rel.RelFieldTrimmer: 11 possible invocations.
org.apache.calcite.sql2rel.RelStructuredTypeFlattener.RewriteRelVisitor: 22
possible invocations.
org.apache.calcite.interpreter static class Interpreter.CompilerImpl: 2
possible invocations.
{code}
Before introducing the global caching, caching is shared per
_ReflectiveVisitDispatcher_ instance, now different _ReflectiveVisitDispatcher_
in different thread is able to reuse the cached methods.
See [^pic2.svg], after tuning, _trimUnusedFields_ only takes 0.64% of the
sampling time compared with 1.38% previously. I think this will help in a lot
more places.
> Use global caching for ReflectiveVisitDispatcher implementation
> ---------------------------------------------------------------
>
> Key: CALCITE-3873
> URL: https://issues.apache.org/jira/browse/CALCITE-3873
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.22.0
> Reporter: neoremind
> Priority: Minor
> Attachments: pic1.svg, pic2.svg
>
>
> For curiosity, I use flame graph to profiling a simple query. The code
> snippet looks like below.
> {code:java}
> String sql = "select empno, gender, name from EMPS where name = 'John'";
> Connection connection = null;
> Statement statement = null;
> try {
> Properties info = new Properties();
> info.put("model", jsonPath("smart"));
> connection = DriverManager.getConnection("jdbc:calcite:", info);
> String x = null;
> long start = System.currentTimeMillis();
> for (int i = 0; i < 50000; i++) {
> statement = connection.createStatement();
> final ResultSet resultSet =
> statement.executeQuery(
> sql);
> while (resultSet.next()) {
> x = resultSet.getInt(1)
> + resultSet.getString(2)
> + resultSet.getString(3);
> } }
> } catch (SQLException e) {
> e.printStackTrace();
> } finally {
> close(connection, statement);
> }
> {code}
>
> I attach the generated flame graph [^pic1.svg]
> {code:java}
> 3% on sql2rel
> 9% on query optimizing,
> 62% of the time is spent on code gen and implementation,
> 20% on result set iterating and checking,
> …
> {code}
> Hope this graph is informative. Since I start to learn Calcite recently, I
> cannot tell where to start tuning, but from the graph one tiny point catches
> my attention, I find there are many reflection invocations in
> _Prepare#trimUnusedFields_. So, I spent some time trying to mitigate the
> small overhead.
> I optimize _ReflectiveVisitDispatcher_ by introducing a global _Guava_ cache
> with limited size to cache methods, also I add full unit tests for
> _ReflectUtil_.
> I count the reference of the method: _ReflectUtil#createMethodDispatcher and_
> _ReflectUtil#createDispatcher (see below)._ Total 68 possible invocations, so
> the cache size is limited, by caching all the methods during the lifecycle of
> the process, we can eliminate reflection looking up methods overhead.
> {code:java}
> org.apache.calcite.rel.rel2sql.RelToSqlConverter: 18 possible invocations.
> org.apache.calcite.sql2rel.RelDecorrelator: 15 possible invocations.
> org.apache.calcite.sql2rel.RelFieldTrimmer: 11 possible invocations.
> org.apache.calcite.sql2rel.RelStructuredTypeFlattener.RewriteRelVisitor: 22
> possible invocations.
> org.apache.calcite.interpreter static class Interpreter.CompilerImpl: 2
> possible invocations.
> {code}
> Before introducing the global caching, caching is shared per
> _ReflectiveVisitDispatcher_ instance, now different
> _ReflectiveVisitDispatcher_ in different thread is able to reuse the cached
> methods.
> See [^pic2.svg], after tuning, _trimUnusedFields_ only takes 0.64% of the
> sampling time compared with 1.38% previously. I think this will help in a lot
> more places.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)