klion26 commented on code in PR #3546:
URL: https://github.com/apache/amoro/pull/3546#discussion_r2084222084


##########
amoro-format-iceberg/src/main/java/org/apache/amoro/optimizing/plan/AbstractOptimizingPlanner.java:
##########
@@ -158,19 +158,23 @@ public List<RewriteStageTask> planTasks() {
     double maxInputSize = maxInputSizePerThread * availableCore;
     actualPartitionPlans = Lists.newArrayList();
     long actualInputSize = 0;
+    double avgThreadCost = actualInputSize / availableCore;
+    List<RewriteStageTask> tasks = Lists.newArrayList();
+
     for (PartitionEvaluator evaluator : evaluators) {
-      actualPartitionPlans.add((AbstractPartitionPlan) evaluator);
-      actualInputSize += evaluator.getCost();
+      AbstractPartitionPlan actualPartitionPlan = (AbstractPartitionPlan) 
evaluator;
+      List<RewriteStageTask> splitTasks =
+          actualPartitionPlan.splitTasks((int) (actualInputSize / 
avgThreadCost));

Review Comment:
   It is ok to change this currently, because the parameter for `splitTasks` is 
not taken into consideration, but maybe in the future we will change the logic 
in `TaskSplitter#splitTasks`
   
   Can we change the `actualInputSize / avgThreadCost` here to some other 
things like `evaluator.getCost() / avgThreadCostPre` -- but currently there is 
no `avgThreadCostPre` 
   
   Maybe we can add a comment here to let others know we need to change this 
here? what do you think about this



##########
amoro-format-iceberg/src/main/java/org/apache/amoro/optimizing/plan/AbstractOptimizingPlanner.java:
##########
@@ -158,19 +158,23 @@ public List<RewriteStageTask> planTasks() {
     double maxInputSize = maxInputSizePerThread * availableCore;
     actualPartitionPlans = Lists.newArrayList();
     long actualInputSize = 0;
+    double avgThreadCost = actualInputSize / availableCore;

Review Comment:
   I have a question about the change here.
   
   Is the change here mainly to correct the log at the end of this function? is 
my understanding right?
   
   Before the change, if the `tasks` is empty because there are no tasks after 
`actualPartitionPlan.splitTasks`, then there would be no actual optimization 
executed because we will check whether the `tasks` are empty or not in 
`AbstractOptimizingPlanner#isNecessary` and `OptimizingQueue#loadTaskRuntimes`.
   - After the change, the `tasks` will be empty also, but the 
`actualInputSize` is the right value.
   
   If this is the case, can we correct the calculation logic for 
`actualInputSize`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to