wu-sheng opened a new issue, #9661:
URL: https://github.com/apache/skywalking/issues/9661

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/skywalking/issues?q=is%3Aissue) and found no 
similar feature requirement.
   
   
   ### Description
   
   For years, we have supported database monitoring from traces, including 
metrics analysis and slow SQLs sampling. 
   But still, the metrics are very general.
   
   <img width="1681" alt="image" 
src="https://user-images.githubusercontent.com/5441976/191632167-352368a1-5a62-4be7-96b0-46d088caf6b9.png";>
   
   We can't tell the hot tables and read/write loads like we recently added for 
visual cache analysis, https://github.com/apache/skywalking/pull/9622
   
   We tried to use Apache ShardingSphere to parse SQL to get the key 
information such as table names, operation(select/update/insert/delete/DDL), 
and key conditions, it is possible from the tech perspective, but not fast 
enough as SkyWalking OAP should handle 1m+ spans/s at least.
   
   ____
   
   Here, after months of thinking, I want to propose that SkyWalking begins to 
provide a new module to implement SQL parser by ourselves. Like all works we 
did, there is no intent to compete with ShardingSphere SQL parser but orient 
APM scenarios.
   
   ### What does APM-oriented mean?
   We should not try to build a complete SQL grammar tree like a SQL database, 
because generally, we just want to fetch important information from to SQLs, 
and build read/write metrics, tables in SQLs(but not columns), key 
conditions(such as table join conditions).
   
   ### What tech stack should we choose?
   Antlr is still our first class choice, as 
   1. It is stable, and widely used for years
   2. SkyWalking OAL engine is built on the top of it, we could have a 
consistent tech stack.
   
   We should only consider building our own grammar tree analysis stack when 
antlr can't provide enough performance(highly not likely)
   
   ### About SQL analysis
   There are some antlr grammar definitions for SQLs in ShardingSphere and 
Antlr repositories, and also out there from various blogs. But we should build 
our own rather than `copy/paste` from them(not about the license). 
   We need a simplified grammar tree for SQL(SQL92/MySQL/PostgreSQL, etc.), to 
only fetch important information
   1. Tables
   2. Join conditions
   3. Operation type, DML(select/insert/update/delete) or DDL(a general type, 
not a performance concern).
   4. Hard code conditions rather than preparedStatement detecting
   
   There is a similar feature from [uptrace SQL 
parser](https://uptrace.dev/sql-parsing). Their footprint idea is to fetch 
important information too.
   
   ### Use case
   
   _No response_
   
   ### Related issues
   
   Last time PoC to do SQL parser through ShardingSphere kernel
   
   - https://github.com/apache/skywalking/issues/5838
   - https://github.com/apache/shardingsphere/issues/8208
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to