[
https://issues.apache.org/jira/browse/BEAM-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073871#comment-17073871
]
Luke Cwik commented on BEAM-8944:
---------------------------------
I was having difficulty in producing a solution that both created threads when
needed and also was able to clean up threads without emptying the pool
completely. Some ideas I had were:
* Have a base pool of threads that never "exit" unless shutdown happens
* Having a dedicated thread that checked queue size every N secs and generate
new threads for what was outstanding (lag in creation, maybe non issue if the
base pool size is like 8-16 threads)
* Whenever a worker gets an item from the queue, it checks queue size and
spawns a new worker thread if there are any items there (will get stuck if
**work** never completes)
* Only create threads on new submission with a certain amount of lag between.
All of these are worse then the idle worker queue. Would be good to know what
was expensive in that solution so that we could address that directly since the
cost might just be coming from the "timed" wait aspect.
> Python SDK harness performance degradation with UnboundedThreadPoolExecutor
> ---------------------------------------------------------------------------
>
> Key: BEAM-8944
> URL: https://issues.apache.org/jira/browse/BEAM-8944
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-harness
> Affects Versions: 2.18.0
> Reporter: Yichi Zhang
> Priority: Critical
> Attachments: checkpoint-duration.png, profiling.png,
> profiling_one_thread.png, profiling_twelve_threads.png
>
> Time Spent: 4h 20m
> Remaining Estimate: 0h
>
> We are seeing a performance degradation for python streaming word count load
> tests.
>
> After some investigation, it appears to be caused by swapping the original
> ThreadPoolExecutor to UnboundedThreadPoolExecutor in sdk worker. Suspicion is
> that python performance is worse with more threads on cpu-bounded tasks.
>
> A simple test for comparing the multiple thread pool executor performance:
>
> {code:python}
> def test_performance(self):
> def run_perf(executor):
> total_number = 1000000
> q = queue.Queue()
> def task(number):
> hash(number)
> q.put(number + 200)
> return number
> t = time.time()
> count = 0
> for i in range(200):
> q.put(i)
> while count < total_number:
> executor.submit(task, q.get(block=True))
> count += 1
> print('%s uses %s' % (executor, time.time() - t))
> with UnboundedThreadPoolExecutor() as executor:
> run_perf(executor)
> with futures.ThreadPoolExecutor(max_workers=1) as executor:
> run_perf(executor)
> with futures.ThreadPoolExecutor(max_workers=12) as executor:
> run_perf(executor)
> {code}
> Results:
> <apache_beam.utils.thread_pool_executor.UnboundedThreadPoolExecutor object at
> 0x7fab400dbe50> uses 268.160675049
> <concurrent.futures.thread.ThreadPoolExecutor object at 0x7fab40096290> uses
> 79.904583931
> <concurrent.futures.thread.ThreadPoolExecutor object at 0x7fab400dbe50> uses
> 191.179054976
> ```
> Profiling:
> UnboundedThreadPoolExecutor:
> !profiling.png!
> 1 Thread ThreadPoolExecutor:
> !profiling_one_thread.png!
> 12 Threads ThreadPoolExecutor:
> !profiling_twelve_threads.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)