[
https://issues.apache.org/jira/browse/IMPALA-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971922#comment-16971922
]
Jiawei Wang commented on IMPALA-9140:
-------------------------------------
Thanks for the detailed explanation! I learned a lot from the comments.
Here is some of my thoughts on this:
{quote}In short, async load requests only have chance to run after all already
pending sync load requests finish in the above case (poolSize=1). This is what
I think how the second pool balances the sync and async loads.
{quote}
When the threads in _TableLoadingMgr.startTableLoadingThreads_ pick up a task
from tableLoadingDeque_, it actually transform the async load request to sync
load request. Because loadNextTable() is sync.
So the race actually happens between the pending sync loading request and the
async loading requests from _numLoadingThreads_ of _startTableLoadingThreads._
For example, when _numLoadingThreads_=3, and things can happen when all 3
threads in tblLoadingPool_ are busy doing some loading tasks. And then a sycn
load request comes and waiting for being loaded. However, the 3 daemon threads
in _TableLoadingMgr.startTableLoadingThreads_ can still have 3 pending
async(which is actually sycn because the threads are blocking) load request to
race with any true sycn jobs from DDL/DML.
But I do agree that the threads in _startTableLoadingThreads_ can provide some
level of balance between sycn and asycn jobs.
{quote}I think maybe a PriorityQueue is a better fit. And the priority can be
defined as sync load > prioritized load > background load. But it's different
with current behavior. Async load requests may starve if there are always new
sync load requests jumpping in. So I still think the current implementation
makes some sense.
{quote}
I think the PriorityQueue solution actually makes sense here. The starve
situation [~stigahuang] mentioned also exist in the current implementation.
Because we can get a lot of new sync load requests and there is no guarantee
when will an idle thread in _tblLoadingPool__ pick up an async task.
I only have limited experience of handling this kind level of multi threads.
Please correct me if anything is wrong.
> Get rid of the unnecessary load submitter thread pool in tblLoadingMgr
> ----------------------------------------------------------------------
>
> Key: IMPALA-9140
> URL: https://issues.apache.org/jira/browse/IMPALA-9140
> Project: IMPALA
> Issue Type: Bug
> Reporter: Vihang Karajgaonkar
> Priority: Major
>
> This JIRA is created as a followup on the discussion on
> https://gerrit.cloudera.org/#/c/14611 related to various pools used for
> loading tables.
> It looks like there are 2 pools of threads both of the size
> {{num_metadata_loading_threads}}. One pool is used to submit the load
> requests to another pool {{tblLoadingPool_}} which does the actual loading of
> the tables. I think we can get rid of the pool which submits the tasks since
> it is not very time-consuming operation and can be done synchronously (all it
> needs to do submit the task in the queue in the front or back based on
> whether its a prioritized load or background load). This will simplify the
> loading code and reduce unnecessary number of threads being created by
> {{TblLoadingMgr}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]