sandynz opened a new issue, #24291:
URL: https://github.com/apache/shardingsphere/issues/24291

   ## Feature Request
   
   ### Is your feature request related to a problem?
   Yes
   
   ### Describe the feature you would like.
   Currently, `InventoryTaskSplitter.getTableRecordsCount` use `SELECT COUNT(*) 
FROM table` to query records count. It might cost too much time, even more than 
time out of connection.
   
   A possible solution is to query estimated records count from database.
   
   Some investigation on databases:
   
   #### MySQL investigation
   
   Query from `INFORMATION_SCHEMA.TABLES`. Example:
   
   ```
   mysql> select table_schema, table_name, table_rows from 
INFORMATION_SCHEMA.TABLES where table_schema='sysbench_ds_1';
   +---------------+------------+------------+
   | table_schema  | table_name | table_rows |
   +---------------+------------+------------+
   | sysbench_ds_1 | sbtest1    |  138247749 |
   +---------------+------------+------------+
   1 row in set (0.02 sec)
   
   mysql> analyze table sysbench_ds_1.sbtest1;
   +-----------------------+---------+----------+----------+
   | Table                 | Op      | Msg_type | Msg_text |
   +-----------------------+---------+----------+----------+
   | sysbench_ds_1.sbtest1 | analyze | status   | OK       |
   +-----------------------+---------+----------+----------+
   1 row in set (0.46 sec)
   
   mysql> select table_schema, table_name, table_rows from 
INFORMATION_SCHEMA.TABLES where table_schema='sysbench_ds_1';
   +---------------+------------+------------+
   | table_schema  | table_name | table_rows |
   +---------------+------------+------------+
   | sysbench_ds_1 | sbtest1    |  138082248 |
   +---------------+------------+------------+
   1 row in set (0.02 sec)
   
   mysql> explain select count(1) from sysbench_ds_1.sbtest1;
   
+----+-------------+---------+------------+-------+---------------+------+---------+------+-----------+----------+-------------+
   | id | select_type | table   | partitions | type  | possible_keys | key  | 
key_len | ref  | rows      | filtered | Extra       |
   
+----+-------------+---------+------------+-------+---------------+------+---------+------+-----------+----------+-------------+
   |  1 | SIMPLE      | sbtest1 | NULL       | index | NULL          | k_1  | 4 
      | NULL | 138082248 |   100.00 | Using index |
   
+----+-------------+---------+------------+-------+---------------+------+---------+------+-----------+----------+-------------+
   1 row in set, 1 warning (0.16 sec)
   
   mysql> select count(1) from sysbench_ds_1.sbtest1;
   +-----------+
   | count(1)  |
   +-----------+
   | 140000000 |
   +-----------+
   1 row in set (36.82 sec)
   ```
   
   `0.02 sec` vs `36.82 sec` on 140 millions records table. The estimated 
records count has accurateness lost: `(140000000-138082248)/140000000.0*100 ~= 
1.37`, it's acceptable.
   
   #### PostgreSQL investigation
   
   Query from `pg_class`. Example:
   ```
   test1=# select count(1) from t_order;
    count
   -------
        6
   (1 row)
   
   test1=# select relname,reltuples::integer from pg_class where 
relname='t_order';
    relname | reltuples
   ---------+-----------
    t_order |         0
   (1 row)
   
   
   test1=# analyze t_order;
   ANALYZE
   
   test1=# select relname,reltuples::integer from pg_class where 
relname='t_order';
    relname | reltuples
   ---------+-----------
    t_order |         6
   (1 row)
   ```
   
   TODO: it needs more test on large table.
   
   #### Possible changes
   
   - Add method `Optional<String> buildEstimatedCountSQL` in 
`PipelineSQLBuilder`, implement it if database support records count 
estimation. We could support `MySQL`, `PostgreSQL` and `openGauss` for now.
   - If estatimated records count is not supported, then use `buildCountSQL`
   
   Notice:
   - The estimated records count might return `0` from database
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to