[GitHub] [shardingsphere] sandynz commented on a diff in pull request #24293: Improve table records count calculation in pipeline job for MySQL

via GitHub Wed, 22 Feb 2023 01:10:51 -0800


sandynz commented on code in PR #24293:
URL: https://github.com/apache/shardingsphere/pull/24293#discussion_r1114028641



##########
kernel/data-pipeline/core/src/main/java/org/apache/shardingsphere/data/pipeline/core/prepare/InventoryTaskSplitter.java:
##########
@@ -171,21 +172,48 @@ private long getTableRecordsCount(final 
InventoryIncrementalJobItemContext jobIt
         PipelineJobConfiguration jobConfig = jobItemContext.getJobConfig();
         String schemaName = dumperConfig.getSchemaName(new 
LogicTableName(dumperConfig.getLogicTableName()));
         String actualTableName = dumperConfig.getActualTableName();
-        // TODO with a large amount of data, count the full table will have 
performance problem
-        String sql = 
PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, 
jobConfig.getSourceDatabaseType()).buildCountSQL(schemaName, actualTableName);
+        PipelineSQLBuilder pipelineSQLBuilder = 
PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, 
jobConfig.getSourceDatabaseType());
+        Optional<String> estimatedCountSQL = 
pipelineSQLBuilder.buildEstimatedCountSQL(schemaName, actualTableName);

Review Comment:
   `estimatedCountSQL` could be `sql`



##########
kernel/data-pipeline/core/src/main/java/org/apache/shardingsphere/data/pipeline/core/prepare/InventoryTaskSplitter.java:
##########
@@ -171,21 +172,48 @@ private long getTableRecordsCount(final 
InventoryIncrementalJobItemContext jobIt
         PipelineJobConfiguration jobConfig = jobItemContext.getJobConfig();
         String schemaName = dumperConfig.getSchemaName(new 
LogicTableName(dumperConfig.getLogicTableName()));
         String actualTableName = dumperConfig.getActualTableName();
-        // TODO with a large amount of data, count the full table will have 
performance problem
-        String sql = 
PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, 
jobConfig.getSourceDatabaseType()).buildCountSQL(schemaName, actualTableName);
+        PipelineSQLBuilder pipelineSQLBuilder = 
PipelineTypedSPILoader.getDatabaseTypedService(PipelineSQLBuilder.class, 
jobConfig.getSourceDatabaseType());
+        Optional<String> estimatedCountSQL = 
pipelineSQLBuilder.buildEstimatedCountSQL(schemaName, actualTableName);
+        try {
+            if (estimatedCountSQL.isPresent()) {
+                long estimatedCount = getEstimatedCountSQLResult(dataSource, 
estimatedCountSQL.get());
+                return estimatedCount > 0 ? estimatedCount : 
getCountSQLResult(dataSource, pipelineSQLBuilder.buildCountSQL(schemaName, 
actualTableName));
+            }
+            return getCountSQLResult(dataSource, 
pipelineSQLBuilder.buildCountSQL(schemaName, actualTableName));
+        } catch (final SQLException ex) {
+            String uniqueKey = dumperConfig.hasUniqueKey() ? 
dumperConfig.getUniqueKeyColumns().get(0).getName() : "";
+            throw new 
SplitPipelineJobByUniqueKeyException(dumperConfig.getActualTableName(), 
uniqueKey, ex);
+        }
+    }
+    
+    // TODO maybe need refactor after PostgreSQL support estimated count.
+    private long getEstimatedCountSQLResult(final DataSource dataSource, final 
String estimatedCountSQL) throws SQLException {

Review Comment:
   `getEstimatedCountSQLResult` could be `getEstimatedCount`.
   
   And also `getCountSQLResult` could be `getCount`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [shardingsphere] sandynz commented on a diff in pull request #24293: Improve table records count calculation in pipeline job for MySQL

Reply via email to