EmmyMiao87 commented on a change in pull request #8348:
URL: https://github.com/apache/incubator-doris/pull/8348#discussion_r839125817



##########
File path: 
fe/fe-core/src/main/java/org/apache/doris/statistics/StatisticsJobScheduler.java
##########
@@ -18,46 +18,243 @@
 package org.apache.doris.statistics;
 
 import org.apache.doris.catalog.Catalog;
+import org.apache.doris.catalog.Column;
+import org.apache.doris.catalog.Database;
+import org.apache.doris.catalog.KeysType;
+import org.apache.doris.catalog.OlapTable;
+import org.apache.doris.catalog.Table;
+import org.apache.doris.catalog.Type;
+import org.apache.doris.common.Config;
+import org.apache.doris.common.DdlException;
 import org.apache.doris.common.util.MasterDaemon;
+import org.apache.doris.statistics.StatisticsJob.JobState;
 
 import com.google.common.collect.Queues;
 
-import java.util.ArrayList;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+import java.util.Arrays;
+import java.util.Collections;
 import java.util.List;
+import java.util.Map;
 import java.util.Queue;
+import java.util.Set;
 
-/*
-Schedule statistics job.
-  1. divide job to multi task
-  2. submit all task to StatisticsTaskScheduler
-Switch job state from pending to scheduling.
+/**
+ * Schedule statistics job.
+ * 1. divide job to multi task
+ * 2. submit all task to StatisticsTaskScheduler
+ * Switch job state from pending to scheduling.
  */
 public class StatisticsJobScheduler extends MasterDaemon {
+    private static final Logger LOG = 
LogManager.getLogger(StatisticsJobScheduler.class);
 
-    public Queue<StatisticsJob> pendingJobQueue = 
Queues.newLinkedBlockingQueue();
+    /**
+     * Different statistics need to be collected for the jobs submitted by 
users.
+     * if all statistics be collected at the same time, the cluster may be 
overburdened
+     * and normal query services may be affected. Therefore, we put the jobs 
into the queue
+     * and schedule them one by one, and finally divide each job to several 
subtasks and execute them.
+     */
+    public final Queue<StatisticsJob> pendingJobQueue;
 
     public StatisticsJobScheduler() {
         super("Statistics job scheduler", 0);
+        this.pendingJobQueue = 
Queues.newLinkedBlockingQueue(Config.cbo_max_statistics_job_num);
     }
 
     @Override
     protected void runAfterCatalogReady() {
-        // TODO
-        StatisticsJob pendingJob = pendingJobQueue.peek();
-        // step0: check job state again
-        // step1: divide statistics job to task
-        List<StatisticsTask> statisticsTaskList = divide(pendingJob);
-        // step2: submit
-        
Catalog.getCurrentCatalog().getStatisticsTaskScheduler().addTasks(statisticsTaskList);
+        StatisticsJob pendingJob = this.pendingJobQueue.peek();
+        if (pendingJob != null) {
+            // step0: check job state again
+            JobState jobState = pendingJob.getJobState();
+            if (jobState == JobState.PENDING) {
+                try {
+                    // step1: divide statistics job to tasks
+                    List<StatisticsTask> tasks = this.divide(pendingJob);
+                    // step2: submit tasks
+                    
Catalog.getCurrentCatalog().getStatisticsTaskScheduler().addTasks(tasks);

Review comment:
       But, before the job is finished, the next job will not be scheduler. 
This means that only one job can be performed within the cluster at the same 
time.
   
   In fact, we only need to control the concurrency of tasks. It can control 
the impact of statistics on cluster pressure.
   So my suggestion is that the job scheduler is only responsible for moving 
the job from pending -> scheduler.
   The job is then removed from the queue.
   The task scheduler is generally responsible for concurrency control.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to