lrsb commented on code in PR #3546:
URL: https://github.com/apache/amoro/pull/3546#discussion_r2086175221


##########
amoro-format-iceberg/src/main/java/org/apache/amoro/optimizing/plan/AbstractOptimizingPlanner.java:
##########
@@ -158,19 +158,23 @@ public List<RewriteStageTask> planTasks() {
     double maxInputSize = maxInputSizePerThread * availableCore;
     actualPartitionPlans = Lists.newArrayList();
     long actualInputSize = 0;
+    double avgThreadCost = actualInputSize / availableCore;
+    List<RewriteStageTask> tasks = Lists.newArrayList();
+
     for (PartitionEvaluator evaluator : evaluators) {
-      actualPartitionPlans.add((AbstractPartitionPlan) evaluator);
-      actualInputSize += evaluator.getCost();
+      AbstractPartitionPlan actualPartitionPlan = (AbstractPartitionPlan) 
evaluator;
+      List<RewriteStageTask> splitTasks =
+          actualPartitionPlan.splitTasks((int) (actualInputSize / 
avgThreadCost));

Review Comment:
   I see what you mean, will publish another review.
   
   Regarding the comment I think it should be placed in the implementation 
classes.



##########
amoro-format-iceberg/src/main/java/org/apache/amoro/optimizing/plan/AbstractOptimizingPlanner.java:
##########
@@ -158,19 +158,23 @@ public List<RewriteStageTask> planTasks() {
     double maxInputSize = maxInputSizePerThread * availableCore;
     actualPartitionPlans = Lists.newArrayList();
     long actualInputSize = 0;
+    double avgThreadCost = actualInputSize / availableCore;

Review Comment:
   > Is the change here mainly to correct the log at the end of this function?
   
   No the issue is different:
   - Let's say there's a PartitionEvaluator which output of splitTasks is empty 
(because, for example in Iceberg, it contains only one data file)
   - Let's also say its cost is > maxInputSize
   
   The result is that:
   - actualPartitionPlans will contain just that evaluator even if there are 
more in the initial list
   - Since split tasks output is empty nothing will be scheduled
   - If this keeps happening we get into a starvation like situation where the 
first evaluator keeps getting scheduled but nothing is actually executed even 
if the initial list there are schedulable evaluators
   
   > After the change, the tasks will be empty also, but the actualInputSize is 
the right value.
   
   No, the output after the change will be populated with the other partition 
evaluators
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to