wu-sheng opened a new issue, #9661: URL: https://github.com/apache/skywalking/issues/9661
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/skywalking/issues?q=is%3Aissue) and found no similar feature requirement. ### Description For years, we have supported database monitoring from traces, including metrics analysis and slow SQLs sampling. But still, the metrics are very general. <img width="1681" alt="image" src="https://user-images.githubusercontent.com/5441976/191632167-352368a1-5a62-4be7-96b0-46d088caf6b9.png"> We can't tell the hot tables and read/write loads like we recently added for visual cache analysis, https://github.com/apache/skywalking/pull/9622 We tried to use Apache ShardingSphere to parse SQL to get the key information such as table names, operation(select/update/insert/delete/DDL), and key conditions, it is possible from the tech perspective, but not fast enough as SkyWalking OAP should handle 1m+ spans/s at least. ____ Here, after months of thinking, I want to propose that SkyWalking begins to provide a new module to implement SQL parser by ourselves. Like all works we did, there is no intent to compete with ShardingSphere SQL parser but orient APM scenarios. ### What does APM-oriented mean? We should not try to build a complete SQL grammar tree like a SQL database, because generally, we just want to fetch important information from to SQLs, and build read/write metrics, tables in SQLs(but not columns), key conditions(such as table join conditions). ### What tech stack should we choose? Antlr is still our first class choice, as 1. It is stable, and widely used for years 2. SkyWalking OAL engine is built on the top of it, we could have a consistent tech stack. We should only consider building our own grammar tree analysis stack when antlr can't provide enough performance(highly not likely) ### About SQL analysis There are some antlr grammar definitions for SQLs in ShardingSphere and Antlr repositories, and also out there from various blogs. But we should build our own rather than `copy/paste` from them(not about the license). We need a simplified grammar tree for SQL(SQL92/MySQL/PostgreSQL, etc.), to only fetch important information 1. Tables 2. Join conditions 3. Operation type, DML(select/insert/update/delete) or DDL(a general type, not a performance concern). 4. Hard code conditions rather than preparedStatement detecting There is a similar feature from [uptrace SQL parser](https://uptrace.dev/sql-parsing). Their footprint idea is to fetch important information too. ### Use case _No response_ ### Related issues Last time PoC to do SQL parser through ShardingSphere kernel - https://github.com/apache/skywalking/issues/5838 - https://github.com/apache/shardingsphere/issues/8208 ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
