# Motivation Druid SQL should be able to emit SQL metrics and write SQL request logs. It should also help users correlate SQL request with underlying native queries so that users can troubleshoot SQL performance issue using native query metrics.
This is a follow-up work of [#4047](https://github.com/apache/incubator-druid/issues/4047). I'm trying to implement all the missing functionalities except SQL cancellation. # Proposed implementation There are two entries for SQL processing: `SqlResource` for HTTP request and `DruidStatement` for JDBC request. In order to remove duplicated logic, `SqlLifecycle` is added as a central place to manage sql execution. It follows the same design as `QueryLifecycle` except it's thread-safe, since `SqlLifecycle` can be invoked in multi-threads in JDBC use cases. The public interface of SqlLifecycle would be ```java class SqlLifecyle { // state transition: NEW -> INITIALIZED void initialize(SqlQuery); void initialize(String sql, Map<String, Object> queryContext); // state transition: INITIALIZED -> PLANNED void plan(AuthenticationResult); void plan(HttpServletRequest); // state transition: PLANNED -> AUTHORIZED or UNAUTHORIZED Access authorize(); // state transition: AUTHORIZED -> EXECUTING Sequence<Object[]> execute(); // state transition: -> DONE void emitLogsAndMetrics(Throwable, String remoteAddress, long bytesWritten); } ``` In `initialize` method, a `sqlId` is assigned to each SQL query. User can specify `sqlId` in query context, or Druid will generate a UUID for it, just like `queryId` for native query. We call it `sqlId` to make it clear that `sqlId` is for SQL request while `queryId` is for native query, and one SQL can map to several native queries. To help users correlate SQL with native queries, `sqlId` is added to query context of native queries, so that from native query's request log, it's clear whether the native query is from end user or SQL, and if it's from SQL, which SQL is it comes from. Similarly, `queryId` of all native queries underpinning a SQL are collected and stored in SQL's query context key `nativeQueryIds`, so that from SQL's request log or metrics, it's clear which native queries it contains. Finally, `sqlId` and its corresponding `queryId`s are returned in `X-Druid-SQL-Id` and `X-Druid-Native-Query-Ids` HTTP headers. Two SQL metrics are emitted in `emitLogsAndMetrics` method, namely `sql/time` and `sql/bytes`. I perfer `sql/time` to `query/sql/time` proposed in [#4047](https://github.com/apache/incubator-druid/issues/4047) because I think in the past, all metrics starting with "query/" are for native queries, and we should stick to that convention. Also `sql/time` makes SQL a first citizen and is more intuitive for me to understand. SQL metrics will have the following dimensions: id, nativeQueryIds, dataSource, remoteAddress, and success. SQL requests are logged in `emitLogsAndMetrics` via `SqlRequestLogger`. Several built-in impls such as `NoopSqlRequestLogger` and `FileSqlRequestLogger` are provided, of cause we can add more later. `SqlRequestLogger` is configured via `druid.sql.request.logging`, and can be extended in similar way with RequestLogger. ```java public interface SqlRequestLogger { void log(SqlRequestLogLine sqlRequestLogLine) throws IOException; default void start() throws IOException {} default void stop() {} } ``` [ Full content available at: https://github.com/apache/incubator-druid/issues/6301 ] This message was relayed via gitbox.apache.org for [email protected]
