wu-sheng opened a new issue #8208:
URL: https://github.com/apache/shardingsphere/issues/8208
Hi @tristaZero , as promised, I run some tests about the performance of SQL
Parser. I choose the MySQL grammar randomly.
Here is the tested SQLs, with the codes I run in the tests.
```java
public class SQLParseBenchmark {
@Test
@Benchmark
public void parseSimpleSQLPerf() {
String sqlStatement = "select * from table_1 where col = ?";
ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement,
false);
}
@Test
@Benchmark
public void parseComplexSQLPerf() {
String sqlStatement = "SELECT \n" +
" productname, \n" +
" buyprice\n" +
"FROM\n" +
" products p1\n" +
"WHERE\n" +
" buyprice > (SELECT \n" +
" AVG(buyprice)\n" +
" FROM\n" +
" products\n" +
" WHERE\n" +
" productline = p1.productline)";
ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement,
false);
}
@Test
@Benchmark
public void parseComplexSQL2Perf() {
String sqlStatement = "SELECT \n" +
" customerNumber, \n" +
" customerName\n" +
"FROM\n" +
" customers\n" +
"WHERE\n" +
" EXISTS( SELECT \n" +
" orderNumber, SUM(priceEach * quantityOrdered)\n" +
" FROM\n" +
" orderdetails\n" +
" INNER JOIN\n" +
" orders USING (orderNumber)\n" +
" WHERE\n" +
" customerNumber = customers.customerNumber\n" +
" GROUP BY orderNumber\n" +
" HAVING SUM(priceEach * quantityOrdered) > 60000);";
ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement,
false);
}
}
```
# Single Thread Case W/O Cache
```
Benchmark Mode Cnt Score Error
Units
SQLParseBenchmark.parseSimpleSQLPerf thrpt 5 200801.158 ± 28973.845
ops/s
SQLParseBenchmark.parseComplexSQLPerf thrpt 5 56151.571 ± 3432.966
ops/s
SQLParseBenchmark.parseComplexSQL2Perf thrpt 5 34157.511 ± 3750.865
ops/s
```
The length of SQL affects the performance pretty much, drop from 200k to
34k, with the complexity of SQLs. And have 10%+- fluctuation.
# Active Cache With 1k/100k possible SQLs
I simply added a variable `I` to provide 1k different SQL samples to warm up
the cache like this.
```java
public class SQLParseBenchmark {
public static int POSSIBILITY = 1000;
public static int I = 1;
@Test
@Benchmark
public void parseSimpleSQLPerf() {
String sqlStatement = "select * from table_1 where col = " + I;
ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement,
true);
if (I > POSSIBILITY) {
I = 1;
} else {
I++;
}
}
@Test
@Benchmark
public void parseComplexSQLPerf() {
String sqlStatement = "SELECT \n" +
" productname, \n" +
" buyprice\n" +
"FROM\n" +
" products p1" + I +
" WHERE\n" +
" buyprice > (SELECT \n" +
" AVG(buyprice)\n" +
" FROM\n" +
" products\n" +
" WHERE\n" +
" productline = p1.productline)";
ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement,
true);
if (I > POSSIBILITY) {
I = 1;
} else {
I++;
}
}
@Test
@Benchmark
public void parseComplexSQL2Perf() {
String sqlStatement = "SELECT \n" +
" customerNumber, \n" +
" customerName\n" +
"FROM\n" +
" customers" + I +
" WHERE\n" +
" EXISTS( SELECT \n" +
" orderNumber, SUM(priceEach * quantityOrdered)\n" +
" FROM\n" +
" orderdetails\n" +
" INNER JOIN\n" +
" orders USING (orderNumber)\n" +
" WHERE\n" +
" customerNumber = customers.customerNumber\n" +
" GROUP BY orderNumber\n" +
" HAVING SUM(priceEach * quantityOrdered) > 60000);";
ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement,
true);
if (I > POSSIBILITY) {
I = 1;
} else {
I++;
}
}
public static void main(String[] args) throws RunnerException {
Options opt = new
OptionsBuilder().include(SQLParseBenchmark.class.getSimpleName())
.forks(1)
.warmupIterations(3)
.threads(1)
.syncIterations(false)
.measurementIterations(5)
.build();
new Runner(opt).run();
}
}
```
```
Benchmark Mode Cnt Score Error
Units
SQLParseBenchmark.parseSimpleSQLPerf thrpt 5 5055988.471 ± 900336.628
ops/s
SQLParseBenchmark.parseComplexSQLPerf thrpt 5 2760262.638 ± 395728.318
ops/s
SQLParseBenchmark.parseComplexSQL2Perf thrpt 5 1686125.536 ± 476530.762
ops/s
```
The performance increases clearly, for the most complex SQL case, it
increases from 34k to 1.6m, nearly 47x.
Then I change the possibility of SQLs to 100k, to make LRU cache not very
effective, interesting things happened.
```
Benchmark Mode Cnt Score Error
Units
SQLParseBenchmark.parseSimpleSQLPerf thrpt 5 126939.502 ± 31545.825
ops/s
SQLParseBenchmark.parseComplexSQLPerf thrpt 5 28216.660 ± 11769.703
ops/s
SQLParseBenchmark.parseComplexSQL2Perf thrpt 5 18382.000 ± 6581.555
ops/s
```
If the LRU cache doesn't work efficiently, it is worse than no cache. 34k
down to 18k, nearly -50%
# Concurrency Tests
I switch to the concurrency cases, first, LRU cache works(1k SQL samples),
and activate 6 threads
```
SQLParseBenchmark.parseSimpleSQLPerf thrpt 5 10535380.660 ±
417418.673 ops/s
SQLParseBenchmark.parseComplexSQLPerf thrpt 5 7126066.718 ±
1634052.358 ops/s
SQLParseBenchmark.parseComplexSQL2Perf thrpt 5 6251270.156 ±
1361498.623 ops/s
```
168k -> 625k, 4x faster in 6 threads(My Local Laptop is 4Core Intel Core i7,
MacBook Pro (15-inch, 2017)).
Then make LRU fails again(100k SQL samples), and still activate 6 threads.
```
SQLParseBenchmark.parseSimpleSQLPerf thrpt 5 159424.392 ± 66291.903
ops/s
SQLParseBenchmark.parseComplexSQLPerf thrpt 5 42207.768 ± 9511.269
ops/s
SQLParseBenchmark.parseComplexSQL2Perf thrpt 5 27640.844 ± 11868.306
ops/s
```
Only 18k -> 27k.
# One More
I am just curious about the performance of no LRU in the concurrency case,
then I keep the 6 threads and close the LRU
```
Benchmark Mode Cnt Score Error
Units
SQLParseBenchmark.parseSimpleSQLPerf thrpt 5 572351.472 ± 27852.061
ops/s
SQLParseBenchmark.parseComplexSQLPerf thrpt 5 164481.782 ± 24937.811
ops/s
SQLParseBenchmark.parseComplexSQL2Perf thrpt 5 88742.504 ± 81617.211
ops/s
```
The most interesting thing happened, it could provide 88k, comparing to 27k
with LRU ON but with large data set.
# Conclusion
The concurrent performance makes me a little concerned (not a block) to use
this in the trace analysis core. Because, the performance of the OAP
backend(one node) could be 10k+ segments/s very easily, in each segment, there
could be 5-10 SQL statements normally. And the execution of the analysis would
be concurrently for sure.
So, the current performance seems an impact on the current status.
Could you have a deeper discussion about what we could do to improve this?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]