Emitswang commented on issue #9628:
URL: https://github.com/apache/shardingsphere/issues/9628#issuecomment-796641932
Hi @tristaZero
I'm glad that my suggestion has been accepted. For the first point, let me
briefly explain the background.
Previously, we have implemented the rule of sharding through mybatis
interceptor at the business application level. That is to say, the SQL
currently applied to the proxy layer is in the form of `select * from
sharding_13_25_db.t_sharding_table_1 where uid =13000251`.
Current stage, we want to take over all SQL requests by using
sharding-proxy. Then, without changing the business, I need to create all the
DB corresponding config files so that sharding-proxy can take over all the SQL
requests smoothly.
For example, if I only configure a sharding rule config file corresponding
to `schemaName: sharding_db`. Business application send sql request: `select *
from sharding_13_25_db.t_sharding_table_1 where uid = 13000251 `, an exception
will occur:` error 1049 (42000): unknown database`.
Therefore, we need a large number of configuration files to enable the proxy
to normally accept all SQL requests. In this way, we will have 2000 dataSources
need to configure, although in fact, we only want to split them into 20 data
source instances, so we don't use the sharding function provided by proxy, we
just do read-write separation.
You may ask why business applications don't modify the SQL to `select * from
sharding_db.t_sharding where uid = 13000251`. This is mainly considered from
the cost of business implement, which mainly involves the following changes:
1. At present, there are some SQL scenarios in the business, which can't
obtain the sharding rule according to SQL itself. It needs to carry out some
association queries, query from the dictionary mapping table, and then rewrite
the sharding rule to SQL.
2. The business has table scanning logic traversing by table name,
starting from `00_db.t_0` to `99_db.t_9` scan data one by one.
Another risk I am worried about is that 2000 table configurations will
produce some large objects. Will this cause runtime exception, such as fullgc
frequently or oom, even affect proxy performance.
In theory, parallelization can speed up the `build` process, and I'm
preparing to make relevant modifications to verify it. If I can, I'd like to be
a contributor to the project. However, as I am a novice in the project, the
whole process of submitting PR is not very clear, so I need to learn it first,
such as:
1. Is the scope of change to be evaluated by myself or to be implemented
after your evaluation and confirmation?
2. If the code I submit does not meet the requirements of the project,
will it wait until I finish the modification?
3. Is it necessary to add new test cases or run the original test cases
for parallelization?
4. Is there a deadline requirement?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]