[GitHub] [skywalking] wu-sheng opened a new issue, #9661: [Feature] A lightweight and APM-oriented SQL parser module

GitBox Wed, 21 Sep 2022 17:39:58 -0700


wu-sheng opened a new issue, #9661:
URL: https://github.com/apache/skywalking/issues/9661

### Search before asking

- [X] I had searched in the
[issues](https://github.com/apache/skywalking/issues?q=is%3Aissue) and found no
similar feature requirement.

### Description

For years, we have supported database monitoring from traces, including
metrics analysis and slow SQLs sampling.
But still, the metrics are very general.

We can't tell the hot tables and read/write loads like we recently added for
visual cache analysis, https://github.com/apache/skywalking/pull/9622

We tried to use Apache ShardingSphere to parse SQL to get the key
information such as table names, operation(select/update/insert/delete/DDL),
and key conditions, it is possible from the tech perspective, but not fast
enough as SkyWalking OAP should handle 1m+ spans/s at least.

____

Here, after months of thinking, I want to propose that SkyWalking begins to
provide a new module to implement SQL parser by ourselves. Like all works we
did, there is no intent to compete with ShardingSphere SQL parser but orient
APM scenarios.

### What does APM-oriented mean?
We should not try to build a complete SQL grammar tree like a SQL database,
because generally, we just want to fetch important information from to SQLs,
and build read/write metrics, tables in SQLs(but not columns), key
conditions(such as table join conditions).

### What tech stack should we choose?
Antlr is still our first class choice, as
1. It is stable, and widely used for years
2. SkyWalking OAL engine is built on the top of it, we could have a
consistent tech stack.

We should only consider building our own grammar tree analysis stack when
antlr can't provide enough performance(highly not likely)

### About SQL analysis
There are some antlr grammar definitions for SQLs in ShardingSphere and
Antlr repositories, and also out there from various blogs. But we should build
our own rather than `copy/paste` from them(not about the license).
We need a simplified grammar tree for SQL(SQL92/MySQL/PostgreSQL, etc.), to
only fetch important information
1. Tables
2. Join conditions
3. Operation type, DML(select/insert/update/delete) or DDL(a general type,
not a performance concern).
4. Hard code conditions rather than preparedStatement detecting

There is a similar feature from [uptrace SQL
parser](https://uptrace.dev/sql-parsing). Their footprint idea is to fetch
important information too.

### Use case

_No response_

### Related issues

Last time PoC to do SQL parser through ShardingSphere kernel

- https://github.com/apache/skywalking/issues/5838
- https://github.com/apache/shardingsphere/issues/8208

### Are you willing to submit a PR?

- [ ] Yes I am willing to submit a PR!

### Code of Conduct

- [X] I agree to follow this project's [Code of
Conduct](https://www.apache.org/foundation/policies/conduct)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail:
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [skywalking] wu-sheng opened a new issue, #9661: [Feature] A lightweight and APM-oriented SQL parser module

Reply via email to