TeslaCN opened a new issue, #18281:
URL: https://github.com/apache/shardingsphere/issues/18281

   ## The current problem in Proxy: the backend exposes multiple entries to the 
frontend
   
   When ShardingSphere-Proxy MySQL was available for the first time, it 
differentiated Text Protocol and Binary Protocol in the backend module 
(directly calling `DatabaseCommunicationEngine`). Text Protocol will process 
all statements, and Binary Protocol will only process DML. This is also the 
origin of many problems that are reported later, some issues or PRs:
   - https://github.com/apache/shardingsphere/issues/9611
   - https://github.com/apache/shardingsphere/pull/10299
   - https://github.com/apache/shardingsphere/issues/11090
   - https://github.com/apache/shardingsphere/issues/12989
   - https://github.com/apache/shardingsphere/issues/17173
   - https://github.com/apache/shardingsphere/pull/17904 This change resulted 
in incorrect transaction status. But the deep-seated reason should be that most 
people are unfamiliar with the protocol and the problem in Proxy backend code 
structure, so they can hardly know that this change will be problematic.
   - Temporary fix for https://github.com/apache/shardingsphere/pull/18161 
#17904: add handle auto commit to all SQL execution entries.
   
   The above list is just the tip of the iceberg. There are many related issues 
that are not enumerated, and some issues that have no issues, but those who are 
familiar with the code know that there are problems:
   - MySQL Proxy uses server-side PreparedStatement to execute DDL, which also 
leads to metadata inconsistency, etc.
   - Proxy uses the connection pool to directly execute `SET` statements, which 
may cause the connection to be polluted. The Proxy side does not handle the 
`SET` and other parameter type statements well, some statements will be 
intercepted, and some statements will be transparently transmitted to a certain 
storage node, resulting in the pollution of a certain database connection in 
the connection pool (related issues: Proxy stuck when stress testing: 
https://github.com/apache/shardingsphere/pull/13682).
   - Taking set encoding as an example, some processing has been done on 
multiple entries of the Proxy protocol.
   
   Temporary solution in the current code: Choose to use 
TextProtocolBackendHandler or DatabaseCommunicateEngine by judging the type of 
SQLStatement. Judging the back-end entry at the front end can temporarily solve 
the problem, but the code will be ugly and difficult to understand.
   
   In addition, the current Proxy backend implementation is overfitting for 
MySQL. MySQL distinguishes between Text and Binary protocols, but PostgreSQL 
does not have this distinction. In fact, the reason is not that other database 
protocols are not like the MySQL protocol, but that the back-end should provide 
what the front-end needs through an unified API. As for the specific protocol 
encoding and decoding, it should be handled by the front-end.
   
   ## Refactoring ideas
   ### gist
   
   - The current `TextProtocolBackendHandlerFactory` carries too much logic and 
is not the only entry for backend. Its logic needs to be adjusted to the main 
process of Proxy.
   - `DatabaseCommunicateEngine` no longer distinguishes whether it is Binary 
or not. Whether `JDBCDriverType` uses Statement or PreparedStatement, JDBC can 
shield the difference. Data encoding as Text or Binary should be handled by the 
front-end protocol.
   - The `DatabaseBackendHandler` before refactoring has the following 
implementations:
     - BroadcastDatabaseBackendHandler iterates over all logical Databases and 
executes statements (SET-like statements).
     - `UnicastDatabaseBackendHandler` executes the statement in the current 
logical Database. If no logical Database is specified (MySQL has no use), it 
will randomly select a data source for execution.
     - `SchemaAssignedDatabaseBackendHandler` almost proxies 
`DatabaseBackendHandler.`
   `DatabaseCommunicationEngine` can directly replace 
org.apache.shardingsphere.proxy.backend.text.data.impl.SchemaAssignedDatabaseBackendHandler.
   - Refactored `TextProtocolBackendHandler` to `ProxyBackendHandler`, MySQL's 
`COM_STMT_EXECUTE` and PostgreSQL/openGauss Portal no longer use 
`DatabaseCommunicateEngine` directly, and use ProxyBackendHandler as the 
backend entry uniformly.
   - ProxyBackendHandler separates creation and execution logic and can be 
created once and executed repeatedly.
   - The separate `PreparedStatementRegistry` for MySQL and PostgreSQL is used 
as part of the ConnectionSession, and the Registry no longer exists alone. 
Extract and abstract the key methods of PreparedStatement as interfaces, and 
add the following states to PreparedStatement to improve PreparedStatement 
performance:
     - `QueryHeader:` unchanged for the same PreparedStatement;
     - `SQLStatementContext:` Only the parameter part needs to be changed. 
There is already a method to do that.
   MySQL PreparedStatement needs to maintain a part of the parameter state to 
support `COM_STMT_SEND_LONG_DATA` (Blob related) and `COM_STMT_FETCH` 
(cursor-like) in the future;
   PostgreSQL's PreparedStatement has no parameter state, and the specific 
state is maintained inside the Portal.
   - Detail parameter optimization:
     - MySQL JDBC Driver sets `prepStmtCacheSize` to `200000`, MySQL 
`max_prepared_stmt_count` is `16382` by default, and the maximum value is 
`1048576`. `200000` is not a suitable value for most scenarios, and will cause 
the cache map of each Connection to occupy about 5 MB of memory. Consider 
modifying it to `8192`.
       - 
<https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_max_prepared_stmt_count>
   - Use `CompletableFuture` instead of Vert.x Future to decouple asynchronous 
logic from Vert.x.
   
   ![refactor_proxy_backend_en 
drawio](https://user-images.githubusercontent.com/20503072/173790535-4c3d3773-3bc3-4756-b576-e198b7bba5dc.png)
   
   ### Resource releasing
   The original resource releasing logic:
   - Not in a transaction, the database connection is returned to pool after 
each request;
   - In a transaction, the database connection is not returned to pool, but all 
used `Statement`s and `ResultSet`s are closed (except for the resources 
occupied by PostgreSQL Portal suspend, MySQL's PreparedStatement life cycle 
cannot span commands). All resources are not released until the end of the 
transaction.
   
   How to control resources after refactored:
   `ProxyBackendHandler` needs to have the following conditions:
   - `execute` Repeatable execution.
   - `releaseUnusedResources`: (internally determine whether it needs to be 
released) method to conditionally releasing Statement and ResultSet.
   
   The following cursors refer to MySQL COM_STMT_FETCH or PostgreSQL Portal
   `BackendConnection` no longer marks `DatabaseCommunicationEngine` usage 
status, but instead marks usage status of JDBC Connection.
   - Not in a transaction, no cursor:
   `releaseUnusedResource` can release all resources. All JDBC Connections can 
be released after the command is executed.
   - Not in transaction, with cursor:
   (This does not exist in PostgreSQL, just consider MySQL) During cursor 
execution, mark the corresponding JDBC connection as using, and only release 
the unused JDBC connection after the command ends. `releaseUnusedResources` 
does not release the resources needed by the cursor.
   - In a transaction, no cursor:
   (According to the original logic) After a single command is executed, all 
JDBC Connections used in the transaction are not released, and are released 
after the transaction ends. `Statements` and `ResultSets` that are no longer 
used can be released.
   - In a transaction, there is a cursor:
   (According to the original logic) After a single command is executed, all 
JDBC Connections used in the transaction are not released, and are released 
after the transaction ends. The PostgreSQL Portal life cycle is the end of the 
transaction, regardless of the cursor after the end of the transaction. For 
MySQL `COM_STMT_FETCH`, since Proxy does not support it originally, it can be 
considered to support cursors that do not cross transactions after 
reconstruction.
   - When the client connection is disconnected:
   Release all resources.
   
   ### Handling auto commit
   After refactoring, consider using `DatabaseBackendHandler` as an abstract 
class to handle auto commit in the execute logic. At this time, the 
`DatabaseBackendHandler` already knows SQL and `SQLStatement,` so it can 
process auto commit conditionally based on this information.
   For other implementations of `ProxyBackendHandler` such as `DistSQLHandler` 
or `TransactionBackendHandler` or `DatabaseAdminExecutor,` there is no need to 
deal with transaction related logic.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to