[GitHub] [shardingsphere] sandynz commented on a diff in pull request #24293: Improve table records count calculation in pipeline job

via GitHub Tue, 21 Feb 2023 22:57:11 -0800


sandynz commented on code in PR #24293:
URL: https://github.com/apache/shardingsphere/pull/24293#discussion_r1113898689



##########
kernel/data-pipeline/api/src/main/java/org/apache/shardingsphere/data/pipeline/spi/sqlbuilder/PipelineSQLBuilder.java:
##########
@@ -135,6 +135,16 @@ default Optional<String> buildCreateSchemaSQL(String 
schemaName) {
     // TODO keep it for now, it might be used later
     String buildCountSQL(String schemaName, String tableName);
     
+    /**
+     * Build Estimate count SQL.
+     *
+     * @param databaseName database name
+     * @param schemaName schema name
+     * @param tableName table name
+     * @return estimate count sql
+     */
+    String buildEstimateCountSQL(String databaseName, String schemaName, 
String tableName);

Review Comment:
   `buildEstimateCountSQL` could be `buildEstimatedCountSQL`



##########
kernel/data-pipeline/core/src/main/java/org/apache/shardingsphere/data/pipeline/core/prepare/InventoryTaskSplitter.java:
##########
@@ -171,12 +171,20 @@ private long getTableRecordsCount(final 
InventoryIncrementalJobItemContext jobIt
         PipelineJobConfiguration jobConfig = jobItemContext.getJobConfig();
         String schemaName = dumperConfig.getSchemaName(new 
LogicTableName(dumperConfig.getLogicTableName()));
         String actualTableName = dumperConfig.getActualTableName();
-        // TODO with a large amount of data, count the full table will have 
performance problem
-        String sql = 
PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, 
jobConfig.getSourceDatabaseType()).buildCountSQL(schemaName, actualTableName);
+        PipelineSQLBuilder pipelineSQLBuilder = 
PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, 
jobConfig.getSourceDatabaseType());
+        String sql = 
pipelineSQLBuilder.buildEstimateCountSQL(dumperConfig.getDataSourceName(), 
schemaName, actualTableName);
         try (
                 Connection connection = dataSource.getConnection();
                 PreparedStatement preparedStatement = 
connection.prepareStatement(sql)) {
+            long estimateCount;

Review Comment:
   `estimateCount` could be `result`



##########
kernel/data-pipeline/dialect/postgresql/src/main/java/org/apache/shardingsphere/data/pipeline/postgresql/sqlbuilder/PostgreSQLPipelineSQLBuilder.java:
##########
@@ -85,6 +85,12 @@ private String buildConflictSQL(final DataRecord dataRecord) 
{
         return result.toString();
     }
     
+    @Override
+    public String buildEstimateCountSQL(final String databaseName, final 
String schemaName, final String tableName) {
+        // TODO Support estimate count later.
+        return buildCountSQL(schemaName, tableName);
+    }

Review Comment:
   It's not supported, so it should return `Optional<String>`



##########
kernel/data-pipeline/core/src/main/java/org/apache/shardingsphere/data/pipeline/core/prepare/InventoryTaskSplitter.java:
##########
@@ -171,12 +171,20 @@ private long getTableRecordsCount(final 
InventoryIncrementalJobItemContext jobIt
         PipelineJobConfiguration jobConfig = jobItemContext.getJobConfig();
         String schemaName = dumperConfig.getSchemaName(new 
LogicTableName(dumperConfig.getLogicTableName()));
         String actualTableName = dumperConfig.getActualTableName();
-        // TODO with a large amount of data, count the full table will have 
performance problem
-        String sql = 
PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, 
jobConfig.getSourceDatabaseType()).buildCountSQL(schemaName, actualTableName);
+        PipelineSQLBuilder pipelineSQLBuilder = 
PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, 
jobConfig.getSourceDatabaseType());
+        String sql = 
pipelineSQLBuilder.buildEstimateCountSQL(dumperConfig.getDataSourceName(), 
schemaName, actualTableName);
         try (
                 Connection connection = dataSource.getConnection();
                 PreparedStatement preparedStatement = 
connection.prepareStatement(sql)) {
+            long estimateCount;
             try (ResultSet resultSet = preparedStatement.executeQuery()) {
+                resultSet.next();
+                estimateCount = resultSet.getLong(1);
+            }
+            if (estimateCount > 0) {
+                return estimateCount;
+            }
+            try (ResultSet resultSet = 
connection.createStatement().executeQuery(pipelineSQLBuilder.buildCountSQL(schemaName,
 actualTableName))) {

Review Comment:
   1, `buildCountSQL` might cost too much time, so add some log for it, 
includes cost time.
   
   2, connection will return to pool but not closed, so will 
`connection.createStatement()` be closed automatically?
   



##########
kernel/data-pipeline/dialect/mysql/src/main/java/org/apache/shardingsphere/data/pipeline/mysql/sqlbuilder/MySQLPipelineSQLBuilder.java:
##########
@@ -89,6 +89,11 @@ public Optional<String> buildCRC32SQL(final String 
schemaName, final String tabl
         return Optional.of(String.format("SELECT BIT_XOR(CAST(CRC32(%s) AS 
UNSIGNED)) AS checksum, COUNT(1) AS cnt FROM %s", quote(column), 
quote(tableName)));
     }
     
+    @Override
+    public String buildEstimateCountSQL(final String databaseName, final 
String schemaName, final String tableName) {
+        return String.format("SELECT TABLE_ROWS from INFORMATION_SCHEMA.TABLES 
WHERE TABLE_SCHEMA = '%s' AND TABLE_NAME = '%s'", databaseName, 
getQualifiedTableName(schemaName, tableName));

Review Comment:
   `from` could be `FROM`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [shardingsphere] sandynz commented on a diff in pull request #24293: Improve table records count calculation in pipeline job

Reply via email to