[GitHub] [airflow] Dinghang commented on a diff in pull request #26639: Multi-threads support for processing diff queues in Kubernetes Executor

GitBox Mon, 19 Dec 2022 21:50:55 -0800


Dinghang commented on code in PR #26639:
URL: https://github.com/apache/airflow/pull/26639#discussion_r1052929714



##########
airflow/executors/kubernetes_executor.py:
##########
@@ -62,6 +64,50 @@
 KubernetesWatchType = Tuple[str, str, Optional[str], Dict[str, str], str]
 
 
+def multi_threads_queue_process(
+    queue_size: int,
+    queue_type: str,
+    process_method: Callable,
+    max_threads: int,
+    log: Logger,
+    batch_size: Optional[int] = None,
+) -> None:
+    """
+    Helper method to enable multi-threads for processing queues used with 
kubernetes executor
+    :param queue_size: the size of the queue getting processed
+    :param queue_type: the type of the queue
+    :param process_method: the real method processing the queue
+    :param max_threads: the max num of threads to be used
+    :param log: log
+    :param batch_size: the max num of items we want to process in this round.
+                       If it's not set, the current queue size will be used.
+    """
+    if queue_size == 0:
+        log.info(f'There is no item to process in the {queue_type} queue.')
+        return
+
+    start_time = time.time()
+    log.info(f'Start processing {queue_type} queue with at most {max_threads} 
threads.')
+
+    batch_size = min(batch_size or queue_size, queue_size)
+    max_threads = min(max_threads, queue_size)
+
+    threads = []
+    quotient, remainder = divmod(batch_size, max_threads)
+    for i in range(max_threads):
+        sub_batch_size = quotient + 1 if i < remainder else quotient
+        t = Thread(target=process_method, args=[sub_batch_size])
+        threads.append(t)
+        t.start()
+    for t in threads:
+        t.join()

Review Comment:
   Hi @dstandish , thanks for the reply. The idea is the same. IIRC, I was 
using ThreadPoolExecutor at the very beginning and then met some issues in 
production with more workloads. Unfortunately, did not record the issue. But 
with the current implementation, there was no issue and it has more flexibility 
for us to do updates. So would like to still go with that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] Dinghang commented on a diff in pull request #26639: Multi-threads support for processing diff queues in Kubernetes Executor

Reply via email to